0% found this document useful (0 votes)
12 views29 pages

Wall Et Al - 2021 - Generative Machine Learning With Tensor Networks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views29 pages

Wall Et Al - 2021 - Generative Machine Learning With Tensor Networks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

Generative machine learning with tensor networks: Benchmarks on near-term quantum computers

Michael L. Wall, Matthew R. Abernathy , and Gregory Quiroz


The Johns Hopkins University Applied Physics Laboratory, Laurel, Maryland 20723, USA

(Received 16 October 2020; revised 3 March 2021; accepted 7 March 2021; published 2 April 2021)

Noisy, intermediate-scale quantum (NISQ) computing devices have become an industrial reality in the last few
years, and cloud-based interfaces to these devices are enabling the exploration of near-term quantum computing
on a range of problems. As NISQ devices are too noisy for many of the algorithms with a known quantum
advantage, discovering impactful applications for near-term devices is the subject of intense research interest.
We explore a quantum-assisted machine learning (QAML) workflow using NISQ devices through the perspective
of tensor networks (TNs), which offer a robust platform for designing resource-efficient and expressive machine
learning models to be dispatched on quantum devices. In particular, we lay out a framework for designing and
optimizing TN-based QAML models using classical techniques, and then compiling these models to be run
on quantum hardware, with demonstrations for generative matrix product state (MPS) models. We put forth
a generalized canonical form for MPS models that aids in compilation to quantum devices, and demonstrate
greedy heuristics for compiling with a given topology and gate set that outperforms known generic methods in
terms of the number of entangling gates, e.g., CNOTs, in some cases by an order of magnitude. We present an
exactly solvable benchmark problem for assessing the performance of MPS QAML models and also present an
application for the canonical MNIST handwritten digit dataset. The impacts of hardware topology and day-to-day
experimental noise fluctuations on model performance are explored by analyzing both raw experimental counts
and statistical divergences of inferred distributions. We also present parametric studies of depolarization and
readout noise impacts on model performance using hardware simulators.

DOI: 10.1103/PhysRevResearch.3.023010

I. INTRODUCTION applications because well-performing ML algorithms feature


robustness against noise, quantum circuits can be designed for
In recent years, gate-based quantum computing has
ML applications that are highly qubit-efficient [15], and quan-
emerged as a relatively mature technology, with many plat-
tum models can be designed whose expressibility increases
forms offering cloud-based interfaces to machines with a few
exponentially with qubit depth [15,16]. The most impact-
to dozens of qubits [1–5], as well as classical emulators of
ful near-term ML application likely lies in quantum-assisted
quantum devices of this class [6]. Today’s quantum comput-
machine learning (QAML), in which a quantum circuit’s
ing resources remain a long way from the millions of qubits
parameters are classically optimized based on measurement
[7] required to perform canonical quantum computing tasks
outcomes that may not be efficiently classically simulable
such as integer factorization with error correction [8,9], and
[17]; this also includes kernel-based learning schemes with
present devices are either engineered with a specific demon-
a quantum kernel [18]. Tensor networks (TNs) provide a ro-
stration goal or designed for general-purpose research-scale
bust means of designing such parameterized quantum circuits
exploration [10]. With the advent of noisy, intermediate-scale
that are quantum-resource efficient and can be implemented
quantum (NISQ) devices [11], whose hardware noise and
and optimized on classical or quantum hardware. TN-based
limited qubit connectivity and gate sets pose challenges for
QAML algorithms hence leverage the significant research
demonstrating scalable universal quantum computation, we
effort into optimization strategies for TNs [19–21], and also
are faced with a different form of quantum application dis-
enable detailed benchmarking and design of QAML models
covery in which algorithms need to be robust to noise, limited
classically, with a smooth transition to classically intractable
qubit connectivity and gate sets, and highly resource-efficient.
models.
Machine learning (ML) has been put forward as a pos-
In this work, we explore the applicability of QAML with
sible application area for NISQ devices, with a range of
TN architectures on NISQ hardware and hardware simulators,
recent proposals [12–14]. ML may prove promising for NISQ
exploring the effects of present-day and near-term hardware
noise, qubit connectivity, and restrictions on gate sets. While
previous studies have investigated the impacts of noise on
ML model outcomes [15,22], our exposition significantly in-
Published by the American Physical Society under the terms of the creases the fidelity of such analyses by obtaining an explicit
Creative Commons Attribution 4.0 International license. Further representation of the optimal TN model with fixed resources
distribution of this work must maintain attribution to the author(s) for a given target quantum hardware. Also in contrast to previ-
and the published article’s title, journal citation, and DOI. ous hardware demonstrations on a classification task [22], we

2643-1564/2021/3(2)/023010(29) 023010-1 Published by the American Physical Society


WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

FIG. 1. Overview of QAML workflow. Classical data in (a) is preprocessed and transformed to quantum states embedded in an exponen-
tially large Hilbert space in (b). A TN model is learned from a collection of quantum training data in (c), which has the interpretation in (d) of
a sequential preparation scheme involving a small number of readout qubits coupled to ancillary resources. The isometries of the sequential
preparation scheme in (d) are conditioned using inherent freedom in the TN representation in (e), and then converted into native gates for
a target hardware architecture, displayed as the IBMQ-X2 processor for concreteness. Running on cloud-based hardware in (f), we obtain
measurements defining output predictions, as in (g). For interpretation of graphical representations, see text.

focus on fully generative unsupervised learning tasks, which classically trained “preconditioner” for the quantum models
have been identified as a promising avenue for QAML [13], that can help avoid local minima and reduce optimization
and focus on the most resource-efficient matrix product state time, as has been explored using classical simulations for
(MPS) TN topology. We present a framework for QAML— other architectures in Ref. [22]. We present exemplar results
outlined in Fig. 1—that includes translation of classical data for our workflow for synthetic data that can be described by
into quantum states, optimization of an MPS model using an exactly solvable two-qubit MPS QAML model, as well as
classical techniques, the conversion of this classically trained on features extracted from the canonical MNIST handwritten
model into a sequence of isometric operations to be performed digit dataset [25].
on quantum resources, and the optimization and compilation The remainder of this work is organized as follows: Sec. II
of these isometric operations into native operations for a given discusses QAML with tensor networks (TNs) broadly, includ-
hardware topology and allowed gate set. In particular, we ing embedding of classical data into quantum states, classical
develop several novel techniques for the compilation stage training of a TN model, and the conversion of TN models into
aimed at TN models for QAML on NISQ devices, such as the resource-efficient sequential preparation schemes; Sec. III dis-
permutation of auxiliary quantum degrees of freedom in the cusses our approach for compiling TN-based QAML models
TN to optimize mapping to hardware resources and heuristics for running on quantum hardware, including the utilization of
for the translation of isometries into native operations using as ambiguity in the TN representation and greedy compilation
few entangling operations (e.g., CNOTs) as possible. heuristics for minimizing model gate depth; Sec. IV presents
The tools developed herein enable the robust design and an exactly solvable two-qubit QAML model and assesses the
performance assessment of QAML models on NISQ devices performance of our QAML workflow on quantum hardware;
in the regime where classical simulations are still possible, in Sec. V, we give an example application to generative
and will inform architectures and noise levels for scaling to modeling of features extracted from the MNIST dataset and
the classically intractable regime. Even in the classically in- analyze the performance of our models as a function of hard-
tractable regime in which the model must be optimized using ware noise using a quantum hardware simulator; finally, in
a quantum device in a hybrid quantum/classical loop [23,24], Sec. VI, we conclude and give an outlook. Details of the
our techniques provide a means of obtaining an approximate, MNIST model studied in Sec. V are given in Appendix E.

023010-2
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

II. QUANTUM-ASSISTED MACHINE LEARNING WITH For one, they are undoubtedly the most well-understood and
TENSOR NETWORKS mature of all tensor networks, which has led to robust op-
timization strategies that are widely used in the quantum
Figure 1 broadly outlines the QAML workflow explored in
many-body community. In addition, MPSs are highly quan-
the present work. We begin with a collection of classical data
tum resource efficient, in the sense that their associated wave
vectors in a training set T = {x j }Nj=1
T
, where each element x j
functions can be sequentially prepared, and so qubits can be
is an N-length vector. The first step in our QAML workflow
reused in deployment on quantum hardware. In fact, it can
is to define a mapping of classical data vectors to vectors in
be shown that every state that can be sequentially prepared
a quantum Hilbert space. Here, the only restriction we will
can be written as an MPS [33–35]. This last point importantly
place on the encoding of classical data in quantum states is
means that any QAML model that achieves the optimal O(1)
that each classical data vector is encoded in an unentangled
scaling of qubit requirements with data vector length N using
product state. This is useful for several reasons. For one,
sequential preparation can be expressed as an MPS, and the
unentangled states are the simplest to prepare experimentally
methods and analyses presented in this work apply.
with high fidelity, and also enable us to use qubit-efficient
In recent years, TNs have found applications outside of
sequential preparation schemes. From a learning perspective,
the condensed matter and quantum information domains. The
encoding individual data vectors in product states ensures that
mathematical analysis community has proposed TN methods
any entanglement that results in a quantum model comes from
for data analysis, e.g., large-scale principal component analy-
correlations in an ensemble of data and not from a priori as-
sis [36,37]. In this community, MPSs are referred to as tensor
sumptions about pre-existing correlations for individual data
trains [38]. Using TN methods to design quantum-inspired
vectors [26].
ML models was first proposed by Stoudenmire and Schwab
The simplest case of a quantum embedding occurs when
[26], who put forth a scheme using a MPS network as a linear
the data is discrete, and so can be formulated as vectors x
classifier in a Hilbert space whose dimension is exponentially
whose individual elements xi ∈ {0, 1}. In this case each ele-
large in the length of the raw data vector. Since then, many
ment is mapped to a qubit as |xi , such that the embedding of
other proposals for quantum-assisted or quantum-inspired TN
the full N-dimensional classical data vector into a register of
ML models have appeared in the literature [22,39–56], in-
N qubits is [27,28]
cluding generative modeling of binary data using MPSs in

N−1 Ref. [57]. In the majority of approaches, DMRG-inspired
|x = |xi  . (1) algorithms for optimization have been employed. However
i=1 the authors of Ref. [58] recently demonstrated an alternate
Embeddings can also be formulated for vectors of continuous strategy where a TN was implemented as a neural network
variable data, as has been explored in Refs. [26,29–31]. We using standard deep learning software, and the tensors of the
will not review those cases here, as only the binary embedding TN were optimized using backpropagation strategies ubiq-
Eq. (1) is utilized in this work. uitous in classical ML. While this strategy has shown good
performance, it has also been shown to be suboptimal with
respect to the DMRG-like approach [59]. Nonetheless, the
A. Tensor networks and sequential preparation use of deep learning “preconditioners” and the intersection of
The next step in our QAML workflow outlined in Fig. 1 is QAML and neural networks remains intriguing [60–62].
to learn a quantum model for the collection of quantum states The fact that MPSs define a sequential preparation scheme
{|x j }Nj=1
T
resulting from applying the encoding map from the means that MPSs provide highly resource efficient archi-
previous section to the training data. Here, we define a quan- tectures for learning [15] and quantum simulation [63]. In
tum model as a collection of operations applied to quantum particular, the qubit resource requirements for an MPS model
resources to produce a state that encodes the properties of the are logarithmic in the bond dimension χ , which encapsulates
ensemble {|x j }. In what follows, we specialize to the case the expressivity of the model, and are independent of the
of tensor network (TN) models, which provide a convenient length of the input data vector N. The details of the procedure
parametrization of the structure of quantum operations and connecting MPS models and sequential preparation schemes
resources. Generally speaking, TNs represent the high-rank are reviewed in Appendix A, where it is shown that a single
tensor describing a quantum wave function in a specified basis “physical” or “readout” qubit together with a χ -level ancilla
as a contraction over low-rank tensors, and hence define fam- (which itself can be made of multiple qubits) suffices to ex-
ilies of low-rank approximations whose computational power tract information from an MPS model, provided we are not
can be expressed in terms of the maximum dimension of any measuring any multiqubit properties of the state. As shown in
contracted index χ , known as the bond dimension. Appendix A, a sequential preparation procedure defined as an
A wide variety of TN topologies have been considered MPS takes the form
which are able to efficiently capture certain classes of quan- 
|ψ = Tr[L[0] j0 . . . L[N−1] jN−1 ]| j0 . . . jN−1  , (2)
tum states [19–21]; in the present work, we focus on matrix
j0 ... jN−1
product states (MPSs). MPSs use a one-dimensional TN
topology, as shown using the Penrose graphical notation for in which the χ × χ matrices L[i] ji satisfy the isometry condi-
tensors [19] in Fig. 1(c), and form the basis for the enormously tion
successful density matrix renormalization group (DMRG) al- 
gorithm in quantum condensed matter physics [32]. MPSs L[i] ji † L[i] ji = Iχ . (3)
have several properties that make them attractive for QAML. ji

023010-3
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

Those familiar with the theory of MPSs will recognize Eq. (3) and using the binary encoding of data, Eq. (1), we find
as defining the left-canonical form of an MPS; standard pro-  
1  1
cedures exist for putting an MPS into this form [19–21]. L(T ) = ln |Tr[A . . . A
x0 †
]| ,
xN−1 † 2
The condition of not measuring any multiqubit properties NT x∈T Z
of the state holds for our specified use case of generating data
vector samples a single element at a time, as shown in Fig. 1 where the normalization factor (partition function) is
(d). Here, the single physical qubit is coupled to three ancilla 
Z= Tr[Ai0  . . . AiN−1  ]Tr[Ai0 . . . AiN−1 ]. (8)
qubits forming a χ = 8-level resource, and the physical qubit
i0 ...iN−1
is measured, with the measurement outcome forming the data
sample element xN−1 . The ancilla qubits are left unmeasured, We will optimize the Born machine by a DMRG-style
the physical qubit is reinitialized, and the procedure of cou- procedure using gradient descent, where the gradient is taken
pling the physical and ancilla qubits and measuring the latter with respect to the tensors of the MPS. Namely, we will
repeated, resulting in the remaining elements of the data vec- consider the gradient with respect to a group of s neighboring
tor xN−2 , . . . , x0 . The process of coupling the physical {| ji } tensors  = Ail . . . Ail+s , with s typically being one or two,
and ancilla {|α} states is defined by the isometric operators noting that the gradient of an object with respect to a tensor
 [i] j is a tensor whose elements are the partial derivatives with
L̂ [i] = Lαβ i |α ji β0| . (4) respect to the individual tensor elements. We take the gradient
αβ ji with respect to the conjugates of the tensors Ai j , formally
The sequential preparation scheme with a single readout considering these conjugates independent of the tensors them-
qubit requires mid-circuit measurement and reset which is not selves. This gradient may be written as
universally available in present-day hardware, but has been 1  ∇ ψ|xx|ψ ∇ Z
demonstrated in, e.g., trapped ion platforms [64]. Finally, we ∇ L(T ) = −
NT x∈T ψ|xx|ψ Z
stress that the scheme for converting from an MPS model to
a sequential preparation procedure is formal in the sense that 1  ∇ ψ|xx|ψ  ∇ ψ|xx|ψ
it produces isometries acting on quantum resources without = − .
NT x∈T ψ|xx|ψ x
Z
reference to their actual physical representation or other hard-
ware constraints such as limited coherence time, connectivity, (9)
gate sets, etc.. The translation of these formal isometries into With this gradient in hand, we update the local block of
operations to be dispatched on a given target hardware are tensors as
detailed in Sec. III.
 →  + η∇ L(T ) , (10)
B. Generative MPS models and classical training procedure in which η is a learning rate (note that this is equivalent
We now further specialize to generative models, in which to minimizing the negative log likelihood). For the single-
a collection of quantum data vectors are encoded into a wave site algorithm (s = 1), this update does not change the bond
function |ψ such that the probability distribution evaluated at dimension or canonical form of the MPS. For the two-site
data vector x is algorithm (s = 2), we can now split the updated tensor  into
ψ|xx|ψ its component MPS tensors as
P(x) = . (5) 
Z iαβj
= j
Aiαμ Aμβ , (11)

Here, Z = ψ|ψ = x ψ|xx|ψ is a normalization factor, μ
and |x denotes the encoding of a classical data vector x into a
quantum state as in Eq. (1). As this corresponds to Born’s rule using, e.g., the SVD. Hence, the addition of the gradient
for measurement outcomes, the resulting structure is referred can increase the bond dimension, and thus the representation
to as a Born machine [57,65]. The state |ψ, or, equivalently, power, adaptively based on the data. The bond dimension can
the process of generating |ψ from a known fiducial state, is also be set implicitly by requiring that the L2 -norm of the
our quantum machine learning model. tensor  is represented with a bounded relative error ε. The
In order to discuss data representation using Born ma- above update has affected only a small group of the tensors
chines, we define the average log-likelihood of the data in the with all others held fixed. A complete optimization cycle,
training set T as or “sweep,” occurs when we have updated all tensors twice,
  moving in a back-and-forth motion over the MPS. The sweep-
1  ψ|xx|ψ ing process is converged once the negative log-likelihood no
L(T ) = ln . (6)
NT x∈T Z longer decreases substantially. Example convergence behavior
will be given later in Sec. V.
The minimization of the negative log-likelihood with respect
to the parameters in our Born machine is equivalent to maxi-
III. COMPILATION OF MPS MODELS FOR
mizing the probability that the data is generated by the Born
QUANTUM HARDWARE
machine. Parameterizing the wave function |ψ as an MPS as
 In this section, we address how to take an MPS model
|ψ = Tr[Ai0 † . . . AiN−1 † ]|i0 . . . iN−1 , (7) resulting from the classical optimization procedure outlined
i0 ...iN−1 in Sec. II B and convert it into a sequence of operations to be

023010-4
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

performed on a quantum device. We will refer to this oper-


ation as quantum compilation. Many modern NISQ software
ecosystems and other stand-alone applications [1,2,4,66–68]
have routines for compiling quantum instructions, usually
required to be supplied in the form of an abstract quantum
circuit model. These compilers typically perform multiple
passes through the abstract circuit to map virtual qubits from
the abstract model onto the hardware qubits of the device,
route operations between the virtual qubits to hardware qubits,
e.g., by placing SWAP gates, and optimization to minimize
some property of the circuit, such as entangling gate count.
We note that quantum compilation remains an active area of
research, and currently available generic methods for quantum
compilation tend to produce “deep” circuits with significant FIG. 2. Example isometry for optimization. An example isometry
numbers of entangling gates. acting on a single physical qubit in the state |0 and a χ = 7-level
There are several unique properties of our particular quan- ancilla, taken from the MNIST example in Sec. V. The real-valued
tum computing use case—compiling the isometries encoding matrix representation of this isometry is plotted in the basis with
TN models for QAML—that make them unique compared to the physical qubit as the least significant qubit, see Eq. (12), and
traditional quantum computing use cases. For one, the isomet- values not defined by the isometry are set to zero (white). The isom-
ric operations are defined on the Hilbert space of a physical etry has been cleaned to remove small numerical values resulting
qubit and a formal χ -level ancilla, and so may not uniquely from classical optimization, but no further optimization has been
describe an isometric operation on a set of virtual qubits, e.g., applied.
when χ is not a power of 2. Further, since the ancilla degrees
of freedom are never directly measured, there is no preferred
basis or state ordering for these states. Both of these properties The isometry in Fig. 2 acts on a physical qubit and a χ = 7
give freedom that can be utilized to simplify compilation. In dimensional ancilla, transforming the state |00 into a super-
addition, the isometries are the result of an optimization pro- position of |11 and |60, the state |10 into |00, and so
cedure that has a finite tolerance (see Sec. II B), and so do not on. We note that the isometry in Fig. 2 is undefined when
need to be compiled exactly to meet some fine-tuned property. acting on states with |q = 1 in accordance with the sequential
That is to say, model predictions are not more accurate when preparation scheme, but takes arbitrary ancilla states as inputs.
using a compiled unitary that matches the isometry better than Because of the isometry property, we only need to account
the optimization tolerance. For NISQ devices in particular, for the nonzero elements of the operation when matching to
fine-tuning of isometry properties through the introduction of a unitary, and so do not need to distinguish between zero
additional entangling gates may in fact produce worse results elements and undefined elements.
due to the increased noise in the circuit compared to a shal- As a first step in compilation, we will want to “clean” the
lower representation. These properties have motivated us to isometries from the classical model in order to remove noise
pursue optimizations of the tensor network structure as well as at the level of the classical optimization tolerance, otherwise
a set of greedy compilation heuristics, inspired by Ref. [69], we will expend effort attempting to compile this noise into
that we outline in what follows. quantum operations that will not improve the fidelity of the
The key objects that we want to optimize in this section calculation. This amounts to implementing a filter on the MPS
are the isometries L̂ [i] defined by the elements of the MPS to remove elements below some tolerance level ε, which can
in left-canonical form, see Eq. (4) of Sec. II A. Given that be accomplished by using MPS compression to find the MPS
the binary encoding map used in this work, Eq. (1), is real- with specified resources (e.g., restricted bond dimension χ )
valued, all MPS tensors are real-valued, and this extends to |φ that is closest in the L2 -norm to a target MPS |ψ that
the isometries. We will display the isometries using plots of has higher resource requirements (χ  ). While this is optimally
the elements of their matrix representations in a fixed basis, done variationally [19], a simple and practical method for
as in Fig. 2. In this and similar plots, the basis ordering is performing this operation is to use local SVD compression,
defined with the physical qubit (i.e., the qubit that begins in in which the MPS tensor of the orthogonality center A[i] is
the |0 state and is read out after each isometric operation) decomposed by the SVD as
as the least significant qubit such that an isometry acting on
a χ -dimensional ancilla α ∈ {0, . . . , χ − 1} and a physical 
A[i]
αβ →
ji
U(α ji )μ SμVμβ , (14)
qubit q ∈ {0, 1} has state indices
μ

index(|αq) = 2α + q . (12) A[i] ji
αβ → Uαμ SμVμ( ji β ) , (15)
For isometries that have their ancilla states decomposed into μ

qubits, we order those qubits ai ∈ {0, 1} such that significance


increases with label index i, i.e., where the upper expression is for a right-moving update and

nanc the lower for a left-moving update. We can truncate the bond

index ananc . . . a1 q = 2i ai + q, (13) dimension by keeping only the χ largest singular values,
i=1 or determine the new bond dimension implicitly through a

023010-5
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

singular value cutoff ε as representation, we define a matrix of overlaps


 χ
  [i] j [i] j
1− Sμ2 Sμ2 < ε. (16)
[i]
Mαβ = Lαβ Lαβ , (20)
μ j
μ=1

When the MPS tensor is the orthogonality center, this condi- which “integrates out” the physical qubit from the isometry
tion is equivalent to a L2 -norm optimization of the full wave used for sequential preparation, and so acts only in the ancilla
function. Replacing A[i] by the truncated U for a right-moving space. A diagonal M[i] is desired, as this would perfectly
update or by V for a left-moving update and contracting the preserve the individual ancilla basis states and so reduce the
truncated SV or U S into the neighboring tensor completes number of quantum operations required. Recalling that we
the local optimization. Sweeping the optimization across all are only changing either the left or right basis of M[i] at a
tensors completes the filtering step. Since the optimization time, one possible option to increase its diagonal dominance
only deals with the parameters of a single MPS tensor at through transformation of either the left or right basis is to use
a time, it is not guaranteed to be globally optimal, but this the polar decomposition M[i] → U [i] P [i] or M[i] → P [i] U [i]
simple procedure works well in practice. As a side benefit, with U [i] unitary and P [i] Hermitian and positive semidef-
ending the optimization by applying the update Eq. (14) and inite. Using (U [i] )1/2 to transform the bases of {L[i] ji , ji =
replacing the MPS tensor A[i] with U for each tensor places 1, . . . , d} would transform M[i] into P [i] ; however, this trans-
the MPS in left-canonical form, from which the isometries formation does not preserve sparsity in the L[i] ji , and we
for sequential preparation can be constructed from the tensor have found that it often leads to more complex operators in
elements [see Eq. (2)]. practice. Instead, we use the values of U [i] from the polar
decomposition to define a permutation of the ancilla basis
A. Ancilla permutation and the diagonal gauge states as, e.g.,
The conversion of an MPS into left canonical form uses [i] j [i] j
L̃α,argmax|U [i] = Lαβ . (21)
the gauge freedom inherent in MPSs to ensure that each of the :β|

MPS tensors L[i] ji satisfies the isometry constraint


 [i] j [i] j This operation does preserve sparsity, and results in more
Lαβ Lαβ  = δββ  , (17) diagonal operations in the ancilla degrees of freedom. An ex-
αj ample of the isometries for a QAML model with and without
this permutation procedure are shown in the right and left
without changing the overall quantum state. This gauge free- panels of Fig. 3, respectively. We see that the permutation
dom specifies that any invertible matrix X and its inverse can of the basis states does result in a more diagonal isometry
be placed between any two tensors of the MPS, i.e., operator, as desired. A procedure for fixing the overall sign of
Ã[i] ji = A[i] ji X , (18) the isometries, as well as additional ambiguities that can arise
for tensors near the end of the representation, are discussed in
Ã[i+1] ji+1 = X−1 A[i+1] ji+1 . (19) Appendix B.
As with the usual transformation of MPS gauge to mixed
However, the constraint Eq. (17) still allows for the insertion canonical form [19], there is a “right-moving” update that
of any unitary matrix and its inverse on either the left or permutes the right bond basis of a tensor A[i] and the left bond
right bond basis of an MPS tensor in left-canonical form L[i] j basis of A[i+1] and a “left-moving” update that permutes the
without changing the state or the isometry conditions. This left bond basis of A[i] and the right bond basis of A[i−1] . When
freedom stems from the fact that the bond degrees of freedom applied to all tensors, we say that the MPS is in the diagonal
are only used to mediate correlations between the physical gauge, as it is the gauge which enforces the isometries for
degrees of freedom and are not directly measured, and so state preparation to be as diagonal as possible (according to
have no preferred basis for representation. We can attempt to our particular cost functions). We stress that the MPS is still in
exploit this freedom to produce MPS models that are more left-canonical form, and so the sequential preparation scheme
amenable to compilation on a given target hardware. We note still holds; the diagonal gauge merely uses the unitary freedom
that, just as with the ordinary gauge freedom of MPSs, a remaining in the left-canonical form to further optimize the
change of gauge affects two neighboring MPS tensors at a state preparation procedure while maintaining sparsity. There
time, and so an operation that may benefit one tensor also is a single tensor that is not optimized at a certain location k
affects its neighbors and so on down the network. Thus, the in the transformation to the diagonal gauge that we call the
optimal choice of gauge requires a global optimization across diagonality center, analogous to the orthogonality center of
all tensors. mixed canonical form. While the location of the diagonality
To utilize the ambiguity in the basis representation of the center can again be used as an optimization parameter, we
ancilla states, we have devised a simple procedure that we have found it convenient to set the diagonality center to an
have found to aid in compiling isometries for QAML models. isometry that is initially an identity matrix. Such an isometry
The heuristic guiding our scheme is to ensure that operations can always be introduced by padding the classical data vectors
are as “diagonal” as possible, in the sense that qubits prefer- with a zero at location k. The reason for our choice is that
entially remain in their same state rather than being swapped the permutation to diagonal gauge will transform this identity
or mixed with other ancilla qubits. Operationally, in order to isometry into a permutation matrix, which is likely to be
work only within the ancilla basis where we have freedom of easier to compile with high fidelity than a general, nonsparse

023010-6
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

FIG. 4. Exemplar NISQ hardware architecture. The qubit lay-


out (circles), CNOT coupling topology (lines), and error sources of
the five-qubit IBMQ-X2 device [4] as an exemplar NISQ machine.
We note that these error rates are a snapshot, and are subject to
fluctuations.

possible, and especially to minimize the number of two-qubit


gates.
In our compilation heuristic, we enumerate possible uni-
taries by constructing a tree of potential circuit structures with
continuous parameters to be optimized. The root node of our
FIG. 3. Application of diagonal gauge. Example isometries for
tree is comprised of a single-qubit gate [such as the Û3 gate
an operations with a χ = 7 dimensional ancilla before (top: same
in Eq. (22)] for each qubit. Each node in the tree has a child
isometry as Fig. 2) and after (bottom) applying the diagonal gauge
transformation Eq. (21) to the right ancilla basis states.
node corresponding to the placement of an entangling gate
in one of its allowed positions, and then adding single-qubit
gates to the qubits acted on by the entangling gate. Any circuit
isometry. Specific techniques for compilation will be pre- that can be constructed using the allowed entangling gates and
sented in a later section. single-qubit rotations corresponds to a node in this tree, as
proved in Ref. [69]. In order to select between nodes in this
tree, we define a cost function between the isometry L̂ and a
B. Greedy compilation heuristics
unitary candidate Û
Following the fixing of gauge outlined in the last subsec- 
tion, we are in a position where we now want to transform the C(Û , L̂) = |Ui, j − Li, j |2 , (23)
isometries L̂ [i] [see Eq. (4)] into operations to be performed on (i, j)∈S
quantum hardware. The target hardware will have a collection
of qubits laid out with a given topology and an allowed gate set in which S denotes the set of indices such that the elements
of single-qubit rotations and entangling gates between pairs of the matrix representation of the isometry are greater than
of qubits. Generally speaking, two-qubit gates are subject to some tolerance |Li, j | > δ. Because of the isometry property of
higher degrees of noise than the single-qubit gates, and so L̂ and the unitarity of the candidate gates Û , we can optimize
higher-fidelity operations will be obtained by using as few only over the elements in S, which reduces the computational
two-qubit gates as possible. As an example, the error map and complexity of the cost function. Our optimization will select
qubit/gate topology for the IBMQ-X2 machine is shown in a particular unitary Û as being acceptable when the cost
Fig. 4. For this device, the single-qubit gates are defined by function drops below a specified tolerance ε.
[4] The optimization procedure begins by optimizing the root
node (single-qubit gates) over its parameters and checking the
cos θ2 −eiλ sin θ2 cost function if an acceptable gate is found. If no acceptable
Û3 (θ , φ, λ) = , (22)
e sin θ2

ei(λ+φ) cos θ2 gate is found, a queue of gates corresponding to adding an en-
tangling gate and a pair of single-qubit gates to the root node
and the two-qubit gates are controlled-NOT (CNOT) gates, in all allowed locations as outlined above is formed, and these
which are allowed only between qubits designated with a solid gates are optimized and their cost functions recorded. If no
line in Fig. 4. As shown in the figure, the average error of gate from this queue is acceptable, a priority queue is formed
the CNOT gates at the time of this measurement was ∼2.6%, by sorting the gates from this set according to their cost func-
while the error of the single-qubit gates was ∼0.15%. Hence, tions and then appending entangling gates and single-qubit
a goal in compiling our isometries is to use as few gates as rotations as above. In order to avoid an exponential growth

023010-7
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

FIG. 5. Comparison of greedy gate compilation procedure with methods of Ref. [70]. (a) Target isometry, which can be completed to a
permutation operator. (b) Matrix plot of result from greedy compilation procedure (cost function ∼2 × 10−15 ). (c) Matrix plot of result from
the methods of Ref. [70]. (d) Quantum circuit representation of greedy compilation procedure result. (e) Quantum circuit representation of
Ref. [70] result.

of the number of search considerations, we limit the number to map the permutation into a reversible circuit comprised
of gates forming the starting point of the priority queue (i.e., of single-target gates, and these single-target gates are then
before appending new entangling gate and single-qubit rota- compiled into networks of CNOTs, Hadamard gates, and
tions) to a fixed number. This number is used as a convergence R̂z (θ ) = |00| + eiθ |11| rotations.
parameter, and can vary between optimization cycles; we find In order to compare our methods with the generic, con-
that it is useful to allow more gates in early optimization structive method for compiling isometries, we consider the
cycles where the operations involve fewer parameters and so isometry shown in Fig. 3. As noted above, in order to utilize
optimization is fast, and then to decrease the number of kept the generic methods we have to map this isometry into a
gates as the circuits become deeper. Further details on our complete isometry over a set of qubits, which requires us to
implementation of this procedure and some problem-specific define the action of the isometry on the state in which the
implementations are provided in Appendix C. ancilla qubits are all in the state |1, which was left uncon-
Several “generic” methods for the compilation of isome- strained by the optimization procedure. For simplicity, we
tries exist, as reviewed in, e.g., Ref. [71], which can be used a use the “isometric completion” in which the operator takes
baseline for comparison. These algorithms also underlie the this state to itself without modifying the state of the phys-
implementation in QISKIT [4]. In the generic approach, the ical qubit. Using the iso method of the QuantumCircuit
matrix representation of the isometry is decomposed, e.g., class from Qiskit [4] implementing the generic methods
a single column at a time or by the cosine-sine decompo- of Ref. [71] on the unconstrained ibmq_qasm_simulator
sition, and the resulting decompositions expressed in terms hardware topology produces a gate representation with 122
of multiqubit controlled operations, which are themselves de- CNOTs at optimization_level 0, and 120 CNOTs at
composed into a target gate set using known representations. optimization_level 3. The greedy compilation procedure
These approaches are constructive, and so will find decompo- presented in this work achieves a representation with a cost
sitions of any isometry in principle, but they are not designed function error of 5.6 × 10−10 with an order of magnitude
to find the most efficient representation by some metric, e.g., fewer entangling gates for this particular isometry. An explicit
the number of entangling gates. Further, as noted above, the circuit representation is given in Fig. 36(d) of Appendix E.
use of such generic algorithms requires an “isometric comple- As a point of comparison for the specialized methods
tion” in the case that the bond dimension χ is not a power of for permutation gates studied in Ref. [70], we consider the
2, and may expend additional resources in exactly compiling isometry shown in Fig. 5(a). This is indeed a permutation
noise in the isometries. Special purpose methods have also on the space acted upon, and so can be represented by a
been developed for compiling permutation gates in Ref. [70], family of “unitary completions.” We take the straightforward
which have been shown to outperform the generic algorithms choice of unitary completion in which we leave the ancilla
in some cases. This method uses a reversible logic synthesis qubits unchanged by the permutation, as shown in Fig. 5(c).

023010-8
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

The result of applying our greedy compilation procedure is


given in matrix form in panel (b), and in quantum circuit
form in panel (d). This gate requires seven CNOTs, and has a
cost function error of ∼2 × 10−15 . The result of applying the FIG. 6. Example dataset for an exactly solvable MPS gen-
methods of Ref. [70] are shown in panels (c) and (e); the gate erative model with a six-dimensional probability vector p =
here requires 14 CNOT operators. Generally speaking, we (1/5, 1/20, 1/20, 1/4, 1/5, 1/4).
find that our greedy compilation procedure finds comparable
or better gates for isometries corresponding to near-diagonal
permutations compared to using the methods of Ref. [70] with sion of the resulting isometries to be used as a benchmark for
the straightforward unitary completion given above. However, our automated procedures. The outcomes of the calculation
it is also worth noting that our procedure is designed for are the exact isometries
isometries and so generally does not produce permutation op-
L̂ [N−1] = (cos θN−1 |1a 0q  + eiφN−1 sin θN−1 |0a 1q )0a 0q |,
erators on the entire space at the end of optimization. That is to
say, the optimized gate is a permutation in the space spanned (27)
by the isometry, but the full unitary is not a permutation,
see, e.g., Fig. 35 of Appendix E. It is also worth noting that L̂ [ j] = |0a 0q 0a 0q |+(eiφ j sin θ j |0a 1q + cos θ j |1a 0q )1a 0q | ,
for complex, highly nondiagonal permutations, as can occur (28)
for the diagonality center when transforming to the diagonal 
gauge, the methods of Ref. [70] can produce more efficient pi
 pj
in which cos θ j = i< j , sin θ j =  , and |xq  and
representations. i j pi i j pi
|ya  denote the states of the physical (sampled) qubit
IV. EXACTLY SOLVABLE BENCHMARK MODEL and the ancilla qubit, respectively. A natural unitary
completion of the operator L̂ [ j] is given in the basis
As an exactly solvable benchmark, we consider an MPS {|0a 0q , |0a 1q , |1a 0q , |1a 1q } as
Born machine encoding the probability distribution of clas- ⎛ ⎞
sical discrete data vectors x, xi ∈ {0, 1} ∀i. The simplest 1 0 0 0
⎜0 cos θ j eiφ j sin θ j 0⎟
nontrivial situation is when the data vectors consist of all zeros [Û [ j] ] = ⎜
⎝0 −e
⎟, (29)
except for a single 1. Let us denote the probability that the 1 −iφ j
sin θ j cos θ j 0⎠

resides at location i as pi , with N−1i=0 pi = 1. It can be shown
0 0 0 1
that this data can be represented exactly as a bond dimension which can be compiled exactly using the circuit shown in
2 MPS Born machine with tensors Fig. 7.
iφ0 √
00 = 1,
A[0]0 01 = e
A[0]1 p0 , (24)
A. Training and compilation

A[00j]0 = 1, A[01j]1 = eiφ j pj, A[11j]0 = 1, (25) In this section, we detail the application of the methods
√ outlined in this paper to the exactly solvable benchmark in
A[N−1]0
10 = 1, A[N−1]1
00 = eiφN−1 pN−1 , (26) Sec. IV using the probabilities p = (8/31, 18/31, 5/31). To
with the {φ j } denoting arbitrary phases. The presence of demonstrate our methods to this benchmark case, we train a
a large number of arbitrary phases is a generic feature of χ = 2 Born machine using the single-site gradient descent
TN models for generative applications: since the square of described in Sec. II B and compile it into gates using the
the wave function is used to generate classical data sam- procedures of Sec. III with the diagonality center at 1 and
ples, the phase structure of the wave function is generally a greedy optimization tolerance of 5 × 10−4 . Following the
underconstrained. This in turn implies that TN models can usual parlance of MPSs from condensed matter physics, we
have some flexibility over the particular gate set used to will refer to the physical indices of the MPS tensors as sites.
entangle the physical qubits to the ancillae without affect- The results of this procedure are shown in Fig. 8, with panels
ing the sampling outcomes. The exactly solvable model (a)–(c) for site 0, panels (d)–(f) for site 1, and panels (g)–(i)
encapsulated by Eqs. (24)–(26) is a useful benchmark both for site 2. The final cost functions for sites 0, 1, and 2 are
because it is the simplest nontrivial example of a sequen- 6.7 × 10−9 , 7.3 × 10−10 , and 2.0 × 10−9 , respectively. As a
tially preparable QAML model, involving a single ancilla point of comparison, we consider a “hand compiled” version
qubit, and because it can be exactly solved for any clas- of the unitary completed isometries Eqs. (29). Taking φ = 0,
sical data vector length and probabilities p. An example we can compile these gates using the circuit shown in Fig. 7.
dataset for p = (1/5, 1/20, 1/20, 1/4, 1/5, 1/4) is given in
Fig. 6.
The construction in Eqs. (24)–(26) is reminiscent of the • Ry (θj ) • Rz (π/2) • Ry (θj ) • Rz (−π/2)
well-known MPS representation of the W state [35]. In order
to convert this generic MPS into a sequential qubit preparation Rz (π/2) Rz (−π/2)
scheme we should place the MPS into left-canonical form.
Since the bond dimension is known, we can do so in terms of FIG. 7. Circuit decomposition for the natural unitary completion
the QR decomposition. The explicit calculation is relegated to Û [ j] using on a gateset of single qubit rotations and CNOTs. The
Appendix D, where we also discuss a “hand compiled” ver- upper line is the physical qubit and the lower line is the ancilla.

023010-9
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

B. Performance of benchmark on cloud-based hardware


In this section, we present results for the exactly solvable
benchmark model running on cloud-based NISQ hardware,
using IBM devices as an example. We note that the
current IBM hardware does not allow measurement and
re-initialization during an experimental run, and so our se-
quential preparation schemes cannot be directly implemented
on these devices. However, we can still test our generative
models by implementing the gates Û [ j] of the sequential
preparation scheme on a register of (N + 1) qubits prepared
in the |0 . . . 0 state, coupling each physical qubit to the same
ancilla in order from (N − 1) down to 0. This procedure is
limited in practice by the number of available qubits and their
connectivity to a single ancilla qubit. However, for devices
with a cross-shaped topology, such as the IBMQ-X2, we can
readily couple up to 4 qubits to a central ancilla qubit, and for
devices with a T-shaped topology, such as the Vigo, we can
couple up to 3 qubits to a single ancilla.
Utilizing this approach for the exactly solvable Born Ma-
chine model with the probability vector given in Sec. IV A,
we find the circuits shown in Fig. 9. Here, the physical qubits
(those that are sampled to obtain output classical data vectors)
are assigned to be qubits 0, 1, and 3, and the ancilla is qubit
2. The upper panel is for the hand-compiled circuits from
Fig. 7, and the lower panel is the circuit from Fig. 8 using
the workflow put forth in this work. The dashed vertical lines
demarcate the circuits corresponding to the individual sites of
the Born machine, but are inessential and neighboring single-
qubit rotations can be joined for increased efficiency.
As metrics for assessing the performance of our QAML
models, we utilize both the raw experimental counts used
to infer measurement probability distributions and a convex
version of the Kullback-Leibler (KL) divergence between the
ideal (pT ) and estimated (pN ) distributions, as implemented in
SCIPY [73],
⎧   pT 
⎨ pT ln pN − 1 + pN pT > 0, pN > 0

KL(pT , pN ) = pN pT = 0, pN  0 .


∞ otherwise
(30)
FIG. 8. Exactly solvable Born Machine benchmark isometries.
Isometries and optimized gates for the three-site exactly solvable
The noise levels of NISQ devices fluctuate over time, and so
Born Machine benchmark. (a)–(c) are for site 0, (d)–(f) are for site to account for these statistical variations we implemented a
1, and (g)–(i) are for site 2. (a), (d), and (g) are the isometries output jackknife procedure [74] for the mean and variance including
from the classically trained model, (b), (e), and (h) are matrix plots bias correction, utilizing 25 experimental runs per day of
of the unitaries output by our greedy compilation procedure, and (c), 213 = 8192 shots each across 5 days. We further refine each
(f), and (i) are circuit representations of the optimized unitaries. experimental run using the measurement noise filter imple-
mented in QISKIT [4], which produces a measurement noise
correction map from a collection of calibration measurements
However, we additionally note that with the assumption that which are performed immediately before the experimental
the physical qubit starts in the state |0q , the first CNOT in shots.
Fig. 7 is the identity, and so can be neglected, leading to a The results of our jackknife analysis on the experimental
circuit with three CNOTs. We see that the obtained quantum measurement counts per state are shown in Fig. 10. Here,
circuits are substantially different than those obtained by hand panels (a) and (c) are the results for the hand-compiled model
compilation of the “natural” unitary completion, but are still circuit in Fig. 9(a) and panels (b) and (d) are for the auto-
of very high fidelity in the space spanned by the isometry. compiled circuit in Fig. 9(b). Panels (a) and (b) are run on the
In addition, the gates for sites 0 and 1 are shallower than IBMQ-X2 device, and panels (c) and (d) are on the IBMQ-
the hand-compiled gate, which may be anticipated based on Vigo device. In all panels, the rightmost green bar represents
known optimality results for two-qubit gates [72]. the ideal counts given by the model probability vector p =

023010-10
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

FIG. 9. Comparison of hand-compiled and auto-compiled circuits for exactly solvable test case. The exactly solvable benchmark with three
physical qubits (0, 1, and 3) and a single ancilla qubit (2) implemented as a quantum circuit using the hand-compiled circuits in Fig. 7 (top) or
the autocompiled gates in Fig. 8 (bottom).

(8/31, 18/31, 5/31), the center blue bars are the raw exper- the tops of the bars indicate the 1σ confidence intervals from
imental measurements without noise calibration applied, and the jackknife procedure. As noted above, qubits 0, 1, and 3
the leftmost orange bars are the experimental measurements map to the probabilities p0 , p1 , and p2 , respectively, and qubit
with the noise calibration applied. The black lines centered on 2 is the ancilla. Clearly, the application of the measurement

5000 5000
Uncorrected
Corrected
(a) Uncorrected
Corrected
(b)
4000 Expected 4000 Expected

3000 3000
Counts

Counts

2000 2000

1000 1000

0 0
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
111 111 110 110 101 101 100 100 011 011 010 010 001 001 000 000 111 111 110 110 101 101 100 100 011 011 010 010 001 001 000 000

5000 5000
Uncorrected
Corrected
(c) Uncorrected
Corrected
(d)
4000 Expected 4000 Expected

3000 3000
Counts

Counts

2000 2000

1000 1000

0 0
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
111 111 110 110 101 101 100 100 011 011 010 010 001 001 000 000 111 111 110 110 101 101 100 100 011 011 010 010 001 001 000 000

FIG. 10. Comparison of ideal, measured, and noise-corrected measurement outcomes on quantum hardware. The results of the hand-
compiled benchmark model shown in Fig. 9(a) are jackknifed over several days of independent experimental runs on the IBMQ-X2 [panel
(a)] and IBMQ-Vigo (c) hardware. Similarly, the jackknifed results for autocompiled benchmark model shown in Fig. 9(b) are shown for
the IBMQ-X2 (b) and IBMQ-Vigo (d) hardware. The rightmost green bars are the noiseless expectations, the center blue bars are the raw
measurements, and the left orange bars have a measurement filter applied. Black lines indicate 1σ confidence intervals.

023010-11
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

0.35
Hand-Compiled, Uncorrected Auto-Compiled, Uncorrected
Hand-Compiled, Corrected Auto-Compiled, Corrected
0.30
KL Divergence

0.25

0.20
FIG. 12. Example processed MNIST data produced by down-
sampling through a max filter to 7 × 7 pixels and binarization.
0.15
Clockwise from top left, the truth labels are 5,0,4,1,9,4,1,3,1,2.

0.10
1 2 3 4 5 sulting from the distributions averaged over all days. Clearly,
Day
the application of the measurement noise filter improves
the estimation of probabilities, as indicated by a lower KL
0.16 Hand-Compiled, Uncorrected Auto-Compiled, Uncorrected divergence with respect to the ideal results. In addition,
Hand-Compiled, Corrected Auto-Compiled, Corrected
the autocompiled circuits (squares) show a lower KL diver-
0.14 gence than the hand-compiled circuits, likely due to their
shallower circuits. Finally, we find that the Vigo results in
KL Divergence

0.12 panel (b) have lower KL divergence than the X2 results in


panel (a), indicating an overall lower noise level for these
0.10 days, in spite of the day-to-day fluctuations in the KL di-
vergence being comparable in magnitude between the two
0.08 machines.

0.06
1 2 3 4 5
Day V. EXAMPLE USING THE MNIST DATASET

FIG. 11. Convex KL divergence between ideal and measured


The exactly solvable benchmark presented in Sec. IV pro-
QAML outcomes with time. The convex KL divergence Eq. (30)
vided data that was simple and well-structured enough that it
between the ideal, noiseless measurement probabilities and the mea- could be exactly memorized using a single qubit to mediate
surement probabilities inferred from 25 experiments of 213 shots bistring correlations. In this section, we consider a QAML
is shown as a function of experimental run day for the IBMQ-X2 benchmark that is again a generative MPS Born machine,
(top) and IBMQ-Vigo (bottom) devices. Filled symbols use the raw analogous to Sec. IV, but using data from the MNIST hand-
experimental counts (center blue bars in Fig. 10) and empty symbols written digit dataset [25], a canonical ML test case. The
use the counts with measurement noise filter applied (left orange bars MNIST dataset consists of greyscale images, each consisting
in Fig. 10). Lines indicate the KL divergences computed using all of 28 × 28 pixels, of the numbers 0 through 9. We process
measurements from all days. the data by passing through a filter that returns the max value
from every contiguous 2 × 2 pixel block twice, resulting in
images of size 7 × 7. While this is not necessary, and produces
noise filter improves the fidelity of the results. Also, generally less raw data available for learning, it reduces the number of
speaking, the results for the Vigo device (lower panels) are isometries in the sequential preparation scheme to compile,
closer to the ideal results than for the X2 (upper panels). allowing for both a more detailed case-by-case analysis and
The largest probability state resulting from errors is the state reducing the overall gate depth for the ancilla in the sequential
|0000 with no “hot” physical bits, followed by |1100, with preparation scheme. Our next step in processing this data is to
the two highest probability physical bits “hot.” We note that binarize the greyscale images, such that we can use the binary
the outcomes involving the ancilla qubit in the |1 state can be qubit encoding for simplicity. Examples of the processed data
removed in postselection by virtue of the fact that the sequen- are shown in Fig. 12.
tial preparation scheme should end with the ancilla in the |0 In this work, we explore this dataset in the small-data
state (see Sec. II A), but this results in small corrections for regime where MPS models of modest bond dimension can
the present case. Finally, we see that the autocompiled results memorize all patterns, analogous to the benchmark in Sec. IV,
using the approach of Sec. III (right panels) are generally but the bitstrings demonstrate more complex correlations that
closer to the ideal results than the hand-compiled circuits (left require more quantum resources. The bitstrings are specified
panels), though this is not true for each state individually. by mapping the 7 × 7 pixel arrays into binary vectors of length
In Fig. 11, we display the KL divergence between the 49. As an example of the classical optimization procedure,
ideal, noiseless probabilities of measuring each individual an MPS Born machine with χ = 8 (three ancilla qubits) con-
quantum state and the measurement probabilities estimated verges to a negative log-likelihood of 2.563 following roughly
from 25 experiments of 213 shots without (filled symbols) 400 iterations of single-site gradient descent with a learning
and with (empty symbols) the measurement noise calibration rate of η = 10−4 (see Sec. II B) on the NT = 10 item dataset
filter applied. The x axis denotes consecutive experimental shown in Fig. 12. Using χ = 16 (four ancilla qubits), we reach
days, and the horizontal lines indicate the KL divergence re- the theoretical minimum value of the negative log-likelihood

023010-12
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

p=0.10 p=0.09 p=0.09 p=0.09

p=0.08 p=0.08 p=0.07 p=0.06

p=0.06 p=0.06 p=0.03 p=0.03

p=0.02 p=0.02 p=0.02 p=0.01

FIG. 14. Example data sampled from MNIST model on simu-


lated noiseless quantum hardware. The data samples with probability
 1% obtained from the sequential preparation procedure defined
by the isometries in Figs. 20–68 are displayed together with their
FIG. 13. Convergence of classical TN optimization. The conver- probability of occurrence estimated from 213 shots.
gence behavior of the negative log-likelihood is shown for χ = 8
(red solid lines) and χ = 16 (blue dashed lines) in (a). Both results
use single-site gradient descent with a learning rate of η = 10−4 .
(b) displays sample data drawn from the initial, random χ = 16 MPS gates into single-qubit gates and CNOTs, collect adjacent
model before optimization, (c) shows samples drawn from the χ = 8 single-qubit rotations, and then remove small single-qubit
MPS model after optimization, and (d) shows samples drawn from rotations with additional optimization√passes. We note that
the χ = 16 MPS model after optimization. Because of the small ε = 5 × 10−4 translates into roughly ε ∼ 2% error in the
size of the dataset, the χ = 16 model is able to reach the theoretical elements of the compiled unitary, which is comparable to the
minimum and so memorize the full dataset. entangling gate error rates of current cloud-based machines.
To investigate the fidelity of the compiled model, we im-
plemented the sequential preparation procedure in which a
of ln NT after roughly 200 iterations. The convergence behav- single data qubit is coupled to three ancilla qubits using the
ior of these log-likelihoods is shown in Fig. 13, together with isometries in Figs. 20–68 on the IBM qasm hardware sim-
samples drawn from the model before and after optimization. ulator. As described in Sec. II A, the isometries are applied
Because the χ = 8 model does not reach the theoretical min- from site 48 down to site 0 with data qubit measurement in
imum, data elements not seen in the training data are present the z basis and reinitialization in the |0 state between the ap-
in the samples. In contrast, the χ = 16 model reaches the plication of isometries. The outcomes of these measurements
theoretical minimum, and so only produces data samples from constitute a data sample of the model. While measurement
the training data. and reinitialization in the midst of an experimental run are not
Following the classical optimization of the MPS tensors, supported on the current IBM quantum hardware, this oper-
we clean the MPSs to remove small numerical values from the ation is supported in the hardware simulators. Example data
classical optimization procedure, place it into left-canonical generated from 213 runs on ideal, noiseless hardware is shown
form to describe a sequential preparation scheme, and then fix in Fig. 14. Here, the data samples with probability  1% are
the remaining permutation ambiguity in the bond degrees of shown together with their probabilities. The training data from
freedom using the transformation to diagonal gauge described Fig. 12 are clearly recognized as the elements with highest
in Sec. III A. For the case in which the diagonality center probability, save for the digit 5, whose highest probability
is at site 35, we find the isometries collected in Figs. 20– data sample involves confusion with the 1 digit. We recall that
68 of Appendix E. We compile these isometries using our deviations from the ideal training digits and their ideal occur-
greedy compilation procedure with a cost function [Eq. (23)] rence probabilities of 10% are a result of the restriction of the
tolerance of ε = 5 × 10−4 except for the diagonality center, model to χ = 8 as well as the finite optimization tolerance ε.
which we compile using the methods of Ref. [70] following Comparing with Fig. 13(c), which shows samples taken from
the straightforward unitary completion procedure defined in the classically trained model, it appears that the restriction to
Sec. III B; the results are again shown in Figs. 20–68 of χ = 8 has a greater influence on the probabilities and sample
Appendix E. In these figures, the “raw” circuits may include variations than the finite compilation tolerance.
single-qubit rotations with rotation angles near zero and non- To investigate the effects of hardware noise, we use a sim-
native parameterized gates as described in Appendix C, while ple model of depolarization noise in which we take ξ = 1 − F
the “expanded and cleaned” circuits compile the non-native to be the average gate error, with F the average gate fidelity.

023010-13
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

1.0 p=0.0010 p=0.0006 p=0.0005 p=0.0005

0.8
ζ = 0.00
KL Divergence

0.6 ζ = 0.01
ζ = 0.03 p=0.0005 p=0.0005 p=0.0005 p=0.0004
0.4 ζ = 0.04
ζ = 0.05
0.2
p=0.0004 p=0.0004 p=0.0004 p=0.0004
0.0
0.000 0.005 0.010 0.015 0.020 0.025
CNOT Error Rate

FIG. 15. Convex KL divergence between sampled data without


p=0.0004 p=0.0004 p=0.0004 p=0.0004
and with hardware noise. The convex KL divergence Eq. (30) be-
tween a noiseless and noisy simulation is shown as functions of the
CNOT error rate ξ2 and the readout error ζ . The probability distribu-
tions of both the noiseless and noisy models are estimated from 213
shots using a hardware simulator. The vertical dashed line indicates FIG. 16. Example data sampled from MNIST model on simu-
1% CNOT error rate, a rough measure of the current state-of-the-art lated noisy quantum hardware. The highest probability data samples
for NISQ devices. from the sequential preparation procedure defined by the isometries
in Figs. 20–68 on a simulated machine with depolarization error rate
With this, the depolarizing channel is represented by the op- ξ2 = 0.01 (dashed line in Fig. 15) are displayed together with their
erator probability of occurrence estimated from 213 shots.

Êdep = (1 − p)Iˆ + pD̂ , (31)


with greatest occurrence drawn from the model evaluated
in which p = 2Nq ξ /(2Nq − 1) with Nq the number of qubits at the CNOT error rate ξ2 ∼ 0.01, indicated by the dashed
and the Kraus representation of the depolarizing channel is vertical line in Fig. 15, and ζ = 0 are shown in Fig. 16. We
given by the operators can recognize many of the digits from the training set in this
 
Êdep = { 1 − (4Nq − 1)p/4Nq Iˆ⊗Nq , p/4Nq P̂} (32) model, but they occur with significantly lower probabilities
due to the appearance of additional noise-driven patterns.
in which Because of the sequential preparation, we can expect that the
ˆ σ̂x , σ̂y , σ̂z }⊗Nq \Iˆ⊗Nq ,
P̂ = {I, (33) bits near the end of the bitstring (i.e., for sites near 48) are
produced at higher fidelity than those of lower site indices
is the set of Nq -qubit products of Pauli operators without the because of errors present in manipulating the ancilla qubits.
Nq -qubit identity matrix. In our model, we assign the same We see that this is the case in Fig. 17, which displays the
error ξ2 to all CNOT gates and an error ξ1 = ξ2 × 10−2 to all convex KL divergence as functions of the CNOT error rate
single-qubit gates in accordance with typical IBM hardware
characteristics. In addition to the depolarizing error, we also
include an uncorrelated-qubit (tensor product) readout noise 1.0
model parameterized by [75,76]
P(0|0) = 1 − ζ , P(1|0) = ζ , (34) 0.8

1.25
KL Divergence

P(0|1) = 2ζ , P(1|1) = 1 − 2ζ , (35) 1.00 0.6

in which P(a|b) denotes the probability of obtaining measure- 0.75


ment outcome a from a preparation of the quantum state |b, 0.50 0.4
assumed identical across all qubits for simplicity. 0.25
We again characterize the difference in predictions be- 0.00 0.025 0.2
50 0.020 e
tween the ideal and noisy outcomes using the convex KL 40 0.015 Rat
Num 30 0.010 or
divergence defined in Eq. (30). As before, we estimate the ber 20
of B 10 0.005 Er r
probabilities from ensembles of Ns = 213 shots calculated us- its 0 0.000 OT 0.0
CN
ing a hardware simulator for both the noiseless “truth” model
and the noisy models. This divergence is shown in Fig. 15 as FIG. 17. Convex KL divergence variation with noise and bit-
a function of the error parametrizations ξ2 and ζ . The CNOT string length. The convex KL divergence Eq. (30) between a
error rate ξ2 parameterizes both the two-qubit and single-qubit noiseless and noisy simulation is shown as a function of the CNOT
depolarization errors and ζ parameterizes the readout error. error rate ξ2 and the sampled bitstring length at zero readout error
The divergence shows a rapid rise driven by the appearance of (ζ = 0). The probability distributions of both the noiseless and noisy
data samples not present in the noiseless model. The samples models are estimated from 213 shots using a hardware simulator.

023010-14
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

ξ2 and the number of bits sampled in the bitstring at zero the large-data regime. We also note that other studies have
measurement error ζ = 0. Nonmonotonic behavior is due to indicated that TN models with current training strategies gen-
the significant differences in complexity of the gate sequences erally have a tendency towards overfitting [77]. Second, we
to produce individual bits, see Figs. 20–68 of Appendix E. have focused on the applications of MPSs to generative mod-
A rise in error is seen as the number of bits increases as eling, but other TN structures, such as tree tensor networks
the total gate depth increases, peaking around ten bits. Af- [39,78] may also be useful for QAML applications, as well
ter this point the KL divergence levels off as additional bits as other tasks such as feature extraction and classification.
may require shallower gate sequences, bringing the overall The procedures outlined in this paper can be readily adapted
agreement between the noisy and true probability distribution to compiling the isometries appearing in models for other
closer. TNs and other applications. Finally, the procedure outlined in
this paper wherein a model is trained classically before being
compiled to a quantum device cannot by itself yield a quantum
VI. CONCLUSIONS AND OUTLOOK
advantage, as it requires the model to be both classically and
We have presented a complete workflow for generative quantumly simulable. However, our procedures will be useful
quantum-assisted machine learning (QAML) using Born ma- in designing and analyzing TN-inspired model structures for
chines with a matrix product state (MPS) tensor network scaling towards the classically intractable regime, and can also
(TN) structure. In our workflow, classical data is encoded serve as “preconditioners” where a model trained using opti-
into quantum states using an embedding map, the ensemble mal classical strategies is augmented with additional quantum
of quantum states is learned as a TN Born machine using resources and then trained directly on the quantum device or
a classical DMRG-like procedure with gradient descent of in a hybrid quantum/classical optimization loop, potentially
the negative log-likelihood, and the model is compiled into avoiding local minima and speeding up optimization times.
operations for target quantum hardware to obtain data samples
as measurement outcomes. Using MPS-based models enables
the use of highly quantum resource-efficient sequential prepa- ACKNOWLEDGMENTS
ration schemes requiring O(1) qubits for a classical data We would like to thank Dave Clader, Giuseppe
vector length N and O(log2 χ ) qubits for bond dimension χ , D’Aguanno, and Colin Trout for useful discussions and would
which encapsulates the model expressivity. This condition is like to acknowledge funding from the Internal Research and
also sufficient; any model that uses a sequential preparation Development program of the Johns Hopkins University
procedure to generate samples using only O(1) qubits for a Applied Physics Laboratory.
classical data vector length N can be expressed as an MPS and
so our methods are applicable. We expect this class of maxi-
mally quantum resource-efficient models will be of paramount APPENDIX A: PROCEDURE FOR CONVERTING AN MPS
importance when benchmarking QAML applications on near- MODEL INTO A SEQUENTIAL PREPARATION SCHEME
term devices. The purpose of this section is to review the procedure for
We presented several optimizations in the compilation converting between matrix product state models and a sequen-
stage of our QAML workflow, such as the introduction of tial preparation procedure with a χ -level ancilla. To begin, we
the diagonal gauge of the MPS model that utilizes inherent first consider that we have a register of N qubits with states
freedom in the model representation to reduce the complex- | ji , i = 0, . . . , N − 1, ji = 0, 1 in which we want to encode
ity of the compiled model, as well as greedy heuristics for data and a χ -level ancilla |α, α = 0, . . . , χ − 1 that can be
finding shallow gate sequences matching a target isometry to used to entangle the qubits. Starting at the “right” end of the
a specified tolerance given hardware topology and allowed system, we can initialize the (N − 1)st qubit using an operator
gate constraints. We presented an exactly solvable benchmark L̂ [N−1] defined as
model requiring two qubits, and assessed its performance

on currently available quantum hardware. We also presented L̂ [N−1] = Lα[N−1] jN−1 | jN−1 α00| , (A1)
an example application modeling features extracted from the α, jN−1
MNIST dataset parametrically with depolarizing and readout
hardware noise using a hardware simulator. in which the coefficients L [N−1] satisfy the isometry condition
Our results lay the groundwork for utilizing TN models
in practical QAML applications, and leave several avenues 
for future research. First, the QAML demonstrations given Lα[N−1] jN−1  Lα[N−1] jN−1 = 1 . (A2)
α, jN−1
in this work consist of overfit models, and so do not consti-
tute “true” machine learning models which should be able
Clearly, if we start our qubit and ancilla system in the state
to appropriately generalize from data. This is a result of ei-
|00, this operation transforms it into the (entangled) state
ther using data with very simple structure, as in our exactly  jN−1
solvable model, or using a very small sample size of train- α jN−1 Lα |α jN−1 , and the isometry condition ensures that
ing data, as in our MNIST application. Small sample sizes this state is normalized. Moving to the next qubit, we now
were used in the present work to enable detailed analysis entangle it with the ancilla using the operator
of model performance with limited quantum resources. In  [N−2] jN−2
future work, the generalization power of TN-based QAML L̂ [N−2] = Lβα | jN−2 β0α| , (A3)
models on NISQ hardware will be explored moving towards α, jN−2 ,β

023010-15
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

[N−2] jN−2 trapped ion platforms [64]. We then re-entangle the ancilla and
in which the coefficients Lβα expressed as a matrix
L [N−2] jN−2
are subject to the isometry condition qubit using the operator L̂ [N−2] defined in Eq. (A3), measure
 the qubit and record the outcome as xN−2 , and again return
L[N−2] j† L[N−2] j = Iχ , (A4) the qubit to the |0 state. This procedure is repeated with the
j other operations L̂ [ j] until a complete set of N measurements
x is made, which constitutes a data sample. This procedure is
with Iχ the χ × χ identity matrix. This operation now puts
denoted graphically in Fig. 1(d). Clearly, this only requires a
the system in the state
single “physical” or “data” qubit (i.e., the one that is sampled)
L̂ [N−2] L̂ [N−1] |0N−2 0N−1 0ancilla  independent of the input data size N, and the construction of
 the χ -level ancilla requires only log2 χ qubits. We stress that
= [L[N−2] jN−2 L[N−1] jN−1 ]α | jN−2 jN−1 α. (A5) the scheme above is formal in the sense that it produces isome-
jN−2 , jN−1 ,α tries acting on quantum resources without reference to their
We follow this same logic for all subsequent qubits, defining actual physical representation or other hardware constraints
isometric operators that entangle them to the rest of the system such as limited coherence time, connectivity, gate sets, etc..
using the ancilla, until we reach qubit 1, which is attached The translation of these formal isometries into operations to be
using the isometric operator dispatched on a given target hardware are detailed in Sec. III.

L̂ [0] = |i0 00β|. (A6)
i0 ,β APPENDIX B: RESOLVING AMBIGUITIES IN THE
This operator puts the full system into the state TRANSFORMATION TO THE DIAGONAL GAUGE

L̂ [0] . . . L̂ [N−1] |00 . . . 0N−1 0ancilla  In this section we provide additional detail on the reso-
 lution of two ambiguities that arise when converting MPS
= L[0] j0 . . . L[N−1] jN−1 | j0 . . . jN−1 0ancilla . (A7) isometries into the diagonal gauge, as described in Sec. III A.
j0 ... jN−1 The permutation operation Eq. (21) is ambiguous whenever
Hence, in the last step, the qubit states decouple from the the matrix U [i] derived from the polar decomposition of M[i]
ancilla. The qubit state takes the form of an MPS with the [see Eq. (20)] has multiple elements in a column with the same
additional constraint that each of the MPS matrices L satisfies absolute value. Recalling that our sequential MPS preparation
the left-orthogonal condition Eq. (A4). The above procedure scheme requires that the ancilla start and end in the vacuum
can readily be read in reverse; given a general MPS QAML state, we see that this occurs for tensors near the extremal
model with bond dimension χ , values of the representation when an ancilla qubit is first
 utilized or an ancilla qubit is decoupled from the remaining
A[0] j0 . . . A[N−1] jN−1 | j0 . . . jN−1  , (A8) qubits. In such cases, we use the following alternate procedure
j0 ... jN−1 to decide between permutations. First, we enumerate all basis
permutations resulting from these ambiguities for a given
we can convert it into a sequential qubit preparation scheme
with a χ -dimensional ancilla by putting the MPS in left- tensor L[i] ji and construct their associated isometries L̃ˆ (ζ ) ,
canonical form. This transformation to left-canonical form in which ζ indexes permutations. To decide between these
can be done without loss of generality using a well-known permutations, we again would like to make this operator as
procedure involving an orthogonal decomposition, e.g., the “diagonal” as possible, in the sense of minimizing the number
singular value or QR decomposition [19]. Thus the tensors of qubit operations being applied. We construct a simple cost
appearing in an MPS, which could result from a classical function as follows: for each state indexed by the ancilla state
training optimization, can be formally (i.e., modulo compi- α and the physical qubit q as above, we convert the state index
lation into native quantum operations for a given hardware into its binary representation b, which effectively maps the
architecture) translated into operations for deployment on ancilla state onto a collection of log2 χ qubits. As an example,
quantum resources. the states of a four-dimensional ancilla and a single physical
The above prescription assumed the presence of a register qubit give the representations
of N qubits, but due to the sequential nature of the preparation
this is unnecessary, and a single “physical” qubit together with index(|0, 0) = 0 → (0, 0, 0) , (B1)
the χ -level ancilla suffices, provided we are not measuring index(|0, 1) = 1 → (0, 0, 1) , (B2)
any multiqubit properties of the state. As an example, we will
index(|1, 0) = 2 → (0, 1, 0) , (B3)
consider drawing a sample from an MPS wave function gen-
erative model with the binary map Eq. (1). In this application, index(|1, 1) = 3 → (0, 1, 1) , (B4)
we first couple the qubit and ancilla as in Eq. (A1) starting ..
from both in the fiducial state |0. We then measure the qubit ., (B5)
in the computational basis, record its outcome as xN−1 , and index(|3, 1) = 7 → (1, 1, 1) . (B6)
then return it to the fiducial |0 state while leaving the ancilla
unmeasured. We now calculate a distance between two basis states (α, j)
We note that the ability to re-initialize a single qubit and (α  , j  ) with respective binary representations b and b as

independent of the others is not universally available in D[(α, j), (α  , j  )] = ( μ |bμ − bμ |)2 . The term in parenthe-
present-day hardware, but has been demonstrated in, e.g., ses counts the number of individual qubit “flips” required to

023010-16
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

convert one of the states into the other, and the square strongly our methods apply to any other choice of single-qubit and
penalizes multiqubit coordinated flips. We then use the cost entangling gates. While there is no guarantee that there are
function not operations with fewer entangling gates that could be found
using complex-valued gates, we find that the reduction in
Cζ = Tr(|L(ζ ) |D), (B7)
the number of parameters when using real gates significantly
in which D is the matrix with D[•, •] as elements and |L(ζ ) | is improves the optimization time.
the matrix of absolute values of L(ζ ) , to choose from between We utilize sparse representations of the individual parame-
the L(ζ ) . terized gate elements Ma (θ a ), but not for the full unitary Û (θ),
In addition to this permutation ambiguity, there is also a as the latter is not guaranteed to be sparse. The matrix product
sign ambiguity on each of the bond states of the isometry. We in Eq. (C1) and the related expression for gradients dominates
again use diagonal dominance to fix this sign ambiguity in the the computational scaling of the algorithm, which is O(χ 3 )
following fashion: when our diagonal gauge update sweep is in the bond dimension χ . This is the same asymptotic scaling
proceeding towards tensor to the right, we are updating the with χ appearing in the DMRG-type optimization of the MPS
right bond basis of the current tensor. Hence, we identify the network itself, described in Sec. IV A.
nonzero element closest to the diagonal in each column, and The final optimization we have included is to introduce
reverse the sign of this element if it is negative. In order that longer gate sequence “motifs” into the optimization alongside
we do not change the overall state, we multiply the elements in the native entangling gates. In particular, the two motifs we
each corresponding row of the tensor to the right in the MPS have utilized in our work are a two-qubit rotation gate
description. Similarly, in a left-moving sweep, we identify the Ŝ (θ , θ  )
nonzero elements closest to the diagonal in each row, flip signs ⎛  θ−θ   θ−θ  ⎞
if this element is negative, and flip signs in the corresponding cos 2 0θ+θ  0θ+θ  sin 2
columns of the tensor to the left in the MPS representation. ⎜ 0 cos  2 sin  2 0 ⎟
=⎜
⎝  
⎟,

Following transformation to diagonal gauge, we fix the signs 0 − sin θ+θ2
cos θ+θ2 0
of all elements of the diagonality center (chosen, as above, to θ−θ  θ−θ 
− sin 2
0 0 cos 2
be a permutation operator) to be positive. Any sign flips that
occur in this transformation are propagated through neighbor- (C3)
ing MPS tensors until a row or column with elements of mixed which is allowed between any two qubits that have CNOT
sign is identified. This procedure discourages the appearance connectivity, and a version of the Ŝ gate we call F̂ that is
of permutation operators whose elements are both positive and controlled on a third qubit. We find that the former gate can be
negative. compiled using two CNOTs using the ansatz sequence shown
in Eq. (C4)
APPENDIX C: IMPLEMENTATION AND OPTIMIZATION
DETAILS OF GREEDY COMPILATION HEURISTICS Ry (φ0 ) • Ry (φ1 ) • Ry (φ2 )
(C4)
In this section, we briefly note details of our implementa- Ry (φ0 ) Ry (φ1 ) Ry (φ2 )
tion of the greedy compilation heuristics of Sec. III B, along
with some problem-specific optimizations. Our subroutine for
the cost function takes as input a vector of parameters θ, and the latter gate with control on qubit c and the operation Ŝ
constructs a matrix representation of the parameterized gate applied to qubits q1 and q2 can be constructed using
sequence θ θ
 F̂c;q1 q2 (θ , θ  ) = CNOT(c, q2 )CNOT(c, q1 )Ŝq1 q2 − , −
Û (θ) = M̂NG θ NG . . . M̂1 (θ 1 ), (C1) 2 2
θ θ
in which θ i is the vector of parameters used by gate i, and × CNOT(c, q2 )CNOT(c, q1 )Ŝq1 q2 , .
then evaluates the cost function Eq. (23). This enables us to 2 2
obtain analytic gradients of the cost function also as elements (C5)
of products of matrices. We optimize the cost function using
the BFGS method, and allow for multiple batches of input pa- Hence, Ŝ gates require 2 CNOTs for compilation and F̂ gates
rameters with random variations added to avoid local minima. require 8 CNOTs for compilation. Both gates were identified
Additionally, as noted above, all of the isometries that result from experiments with the greedy optimization procedure
from the use of a real-valued quantum embedding map will be outlined above using only CNOTs, and their direct inclusion
real, and so we can restrict our attention to real-valued gates. into the optimization enables more rapid convergence. As
Hence, in our implementation, we parametrize single-qubit these gates require multiple entangling gates, it is useful to
gates as y rotations introduce a heuristic penalty function h into the cost function
  for ordering the next priority queue to ensure that they are not
cos θ2 − sin θ2 chosen over shorter gates with a similar cost function. Such
R̂y (θ ) ≡ , (C2)
sin θ2 cos θ2 a penalty function was advocated in Ref. [69], and could also
be used to account for, e.g., hardware-dependent noise [79].
which relate to the gates in Eq. (22) as R̂y (θ ) = Û3 (θ , 0, 0), The choice of this penalty function will be problem-specific,
and CNOTs for the entangling gates. While we have made and finding ways for optimizing it in a data-driven fashion for
the above gate choices for use in this paper, we stress that problems of interest is an intriguing area for further research.

023010-17
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

FIG. 18. Example of greedy compilation procedure. Example gates represented as circuits and matrix plots resulting from applying the
greedy compilation procedure to the isometry shown in the upper left (same isometry as in the right panel of Fig. 3). The starting ansatz is a
single-qubit rotation on each qubit, given in the top center of the figure. The next row down shows the gates resulting from adding a single
entangling gate to this ansatz, ordered left to right by their cost functions C. A constant penalty 0.6 is added to the cost function for use of a F̂
gate in ordering the priority queue, resulting in the given ordering. The gates indicated by green lines denote those passed to the next level of
optimization. This procedure terminates in the gate shown at the bottom of the figure with the given cost function tolerance of 5 × 10−4 .

We also note that the use of multiqubit controlled gates is for ordering the priority queue when no gates meet the cost
penalized through the choice of the cost function Eq. (B7) function tolerance.
for choosing the permutation to diagonal gauge; the choice
of a cost function of 4 or 8 for a gate requiring two and three
bit flips is in rough accordance with the number of CNOTs
required for Ŝ and F̂, respectively.
An example application of this procedure to the isometry APPENDIX D: EXPLICIT CONSTRUCTION OF
shown in the right panel of Fig. 3 is given in Fig. 18. Here, EXACTLY SOLVABLE MPS MODEL
we give cost function penalties of 0.6 and 0.2 for F̂ and Ŝ
In this section, we detail the transformation of the MPS
gates, respectively, use a cost function tolerance of 5 × 10−4 ,
representation of the exactly solvableng model Eq. (24)–
and keep the four lowest cost gates to generate the priority
(26) to left-canonical form, defining a sequential preparation
queue from the first optimization and the two lowest-cost
scheme. We then further detail a “by hand” compilation of
gates on subsequent optimizations. The successive rows show
a particular unitary completion of the resulting isometries as
the optimized gates resulting from adding a single entangling
a benchmark for our automated procedure. For simplicity of
gate to the ansatz resulting from the last round of optimization,
exposition, we will take all phases φ j = 0, though we will
starting with a single-qubit rotation on each qubit (top center).
relax this condition shortly. Performing the QR decomposition
The green lines show the gates which are kept to form the
on the first tensor, we find
new priority queue. Here and throughout, the quantum circuits
are ordered with the physical (i.e. readout) qubit on the top
line and the ancilla qubits in increasing order on lower lines.
Following an optimization in which Ŝ or F̂ gates may be
1 0
used, the “raw” circuit containing these parameterized gates A[0] = √ (D1)
is then compiled into CNOTs using Eqs. (C4) and (C5), 0 p0
 
products of single-qubit rotations are collected together, and 1 0 1 0
then optimization passes are run to determine if single-qubit →QR = p0 √ , (D2)
0 |p0 | 0 p0
gates with rotations smaller than a certain threshold can be  
removed without affecting the cost function. We note that no 1 0
cost function penalty is applied when an Ŝ or F̂ gate brings ⇒ L [0] = p0 , (D3)
0 |p0 |
the cost function below its desired tolerance, as in the last √ √
00 = 1 , A01 =
A[1]0 p1 , A[2]0 11 = p0 .
[1]1
step of the optimization shown in Fig. 18, but is only used (D4)

023010-18
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

Reshaping the second tensor and decomposing, we find • Ry (θj ) • Rz (π/2) • Ry (θj ) • Rz (−π/2)
⎛ ⎞
1 0
√ Rz (π/2) Rz (−π/2)
⎜0 √ p1 ⎟
(αi)β = ⎝0
A[1]
p ⎠
(D5)
0
0 0 Ry (2θN −1 ) X • X
⎛ ⎞
1  0
⎜0 p1 ⎟
⎜ ⎟ 1
→QR = ⎜

 p0 +p1

⎟ 0 √ 0 , (D6) FIG. 19. Circuit decompositions for Û [ j] in Eq. (D14) (top) and
⎝0
p0
⎠ p0 + p1 U [N−1]
in Eq. (D16) (bottom) based on a gateset of single qubit
p0 +p1
rotations and CNOTs. In both diagrams, the upper line is the physical
0 0
  (sampled) qubit and the lower line is the ancilla.
p1 p0
⇒ L00
[1]0
= 1 , L01
[1]1
= , L11
[1]0
= .
p0 + p1 p0 + p1 We note that there is a “natural” unitary completion of the
(D7) operators in Eq. (D13) given by
This generalizes to Û [ j] = |0a 0q 0a 0q | + |1a 1q 1a 1q |
     
pj pi
[ j]0 [ j]1
= 1 , L01 =  [ j]0
, L11 = 
i< j
, (D8) p j i< j pi
L00
pi pi + e iφ j
 |0a 1q +  |1a 0q  1a 0q |
i j i j i j pi i j pi

with the last tensor being    


 i< j pi −iφ j pj
+  |0a 1q −e  |1a 0q  0a 1q | ,

[N−1]0
L10 = pi , L00
[N−1]1
= pN−1 . (D9) i j pi i j pi
i<N−1 (D14)
We now take these left-canonical tensors and reshape them in which the state |1a 1q —that is never populated un-
to correspond to isometries acting on a single physical qubit der ideal operation—is left unchanged and the action
|iq  and an ancilla qubit |αa . We start from the N th tensor, on the—also ideally unpopulated—state |0a 1q  is deter-
where both the qubit and ancilla are in the state 0. The isome- mined by orthogonality. Written in the basis representation
try is {|0a 0q , |0a 1q , |1a 0q , |1a 1q }, we find
 √ ⎛ ⎞
L̂ [N−1] = ( 1 − pN−1 |1a 0q  + pN−1 |0a 1q )0a 0q | . 1 0 0 0
⎜0 cos θ j eiφ j sin θ j 0⎟
(D10) [Û [ j] ] = ⎜
⎝0 −e−iφ j sin θ j
⎟, (D15)
cos θ j 0⎠
Following this, the physical qubit can be measured in the 0 0 0 1
computational basis and its outcome (classically) stored, and   p
then the physical qubit is returned to the state |0q . We then pi
in which cos θ j = i< j pi and so sin θ j =  j pi . This gate
i j i j
repeat this procedure of acting with isometries, measuring the
physical qubit, classically recording its output, and returning has a natural interpretation as a rotation within the subspace of
the physical qubit to 0, using the isometries a single quantum of excitation shared between the qubit and
ancilla, with the rotation angle set by the classical data vector
L̂ [ j] = |0a 0q 0a 0q | probabilities [for p j → 0, θ j → 0 and Eq. (D15) becomes the
   identity]. An analogous unitary completion for the isometry
pj i< j pi L̂ [N−1] is given by
+  |0a 1q  +  |1a 0q  1a 0q | .
pi pi
i j i j Û [N−1] = (cos θN−1 |1a 0q +eiφN−1 sin θN−1 |0a 1q )0a 0q |
(D11)
+ (−e−iφN−1 sin θN−1 |1a 0q + cos θN−1 |0a 1q )0a 1q |
We note that Eq. (D11) also holds for the final site, j = 0, and
produces an unentangled ancilla in the state |0a . With these + (eiφN−1 sin θN−1 |1a 1q + cos θN−1 |0a 0q )1a 0q |
operators in hand, we can reinsert the arbitrary phases on the + (−e−iφN−1 sin θN−1 |0a 0q + cos θN−1 |1a 1q )1a 0q | .
elements resulting in the state |1q , yielding
 √ (D16)
L̂ [N−1] = ( 1 − pN−1 |1a 0q  + eiφN−1 pN−1 |0a 1q )0a 0q | ,
From a gate-based perspective, the operators in Eqs. (D15)
(D12) with φ = −π /2 are described by the Fermionic Simulation,
or fSim(θ , ϕ) gate [80], with ϕ = 0 and θ = θ j ; this gate
L̂ [ j] = |0a 0q 0a 0q | has been recently been demonstrated in gmon qubits [81].
    Alternatively, they are a one-parameter generalization of the
pj i< j pi
+ eiφ j
 |0a 1q  +  |1a 0q  1a 0q | . iSWAP gate [82], which can be compiled using the gate se-
i j pi i j pi quences shown in Fig. 19. We note that the unitary completion
Û [ j] at φ j = 0 is given by Ŝ (θ j , θ j ) in the notation of Eq. (C3),
(D13) and so for the gate set employed by the IBMQ processors, the

023010-19
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

shortest decomposition for Û [ j] is given by Eq. (C4). While


(a) (b) (c)
in some alternative hardware platforms, such those employ-
ing tunable qubits [83], iSWAP gates can be implemented
natively, partial iSWAPs still require decomposition. We also
note that the operation Eq. (D15) is generated by the effective
Hamiltonian
Ĥ[ j] = θ j (σ̂q+ σ̂a− + σ̂q− σ̂a+ ) , (D17) FIG. 20. Optimization for site 0. (a) Isometry. (b) Optimized
for “unit time” in the sense that gate. (c) Circuit from optimization.

Û [ j] = exp(−iĤ[ j] ) , (D18)
when φ j = π /2. This gate is readily achieved in trapped (a) (b) (c)

ion-based quantum computers using an equally weighted


combination of XX and YY Mølmer-Sørenson gates [84],
as well as a variety of other platforms implementing XY
effective spin-spin interactions [85]. It is also interesting that
the “data angle” θ j has a natural interpretation as an ersatz
“evolution time” in this perspective. FIG. 21. Optimization for site 1. (a) Isometry. (b) Optimized
Before moving on from this exactly solvable example, we gate. (c) Circuit from optimization.
would like to point out how the freedom in representation
of the bond basis manifests itself in this exactly solvable
example. Namely, the predictions of the model Eq. (24)–(26) (a) (b) (c)

are unchanged if we reverse the roles of the |0a  and |1a 


ancilla states in all but the first and last steps of preparation
(using the unitary freedom exploited in the transformation to
diagonal gauge discussed in Sec. III A). In this case, we have
the isometries
 FIG. 22. Optimization for site 2. (a) Isometry. (b) Optimized

L̃ˆ [N−1] = ( 1 − pN−1 |0a 0q  + eiφN−1 pN−1 |1a 1q )0a 0q | , gate. (c) Circuit from optimization.
(D19)

L̃ˆ [ j] = |1a 0q 1a 0q |


(a) (b) (c)
(D20)
   
pj i< j pi
+ eiφ j  |1a 1q  +  |0a 0q  0a 0q | .
p
i j i i j pi
(D21)
FIG. 23. Optimization for site 3. (a) Isometry. (b) Optimized
The natural unitary completions of these isometries take the
gate. (c) Circuit from optimization.
matrix representation
⎛ ⎞
cos θ j 0 0 −e−iφ j sin θ j
⎜ ⎟
 ˆ [ j]  ⎜ ⎟
(a) (b) (c)
0 1 0 0
Ũ ⎜
=⎜ ⎟ , (D22)

⎝ 0 0 1 0 ⎠
eiφ j sin θ j 0 0 cos θ j
and so are described by Ŝ (−θ j , θ j ) at φ j = 0, and are gener-
ated by the effective Hamiltonian FIG. 24. Optimization for site 4. (a) Isometry. (b) Optimized
ˆ = θ (σ̂ + σ̂ + + σ̂ − σ̂ − ) , gate. (c) Circuit from optimization.
H̃ j q a q a (D23)
at φ = π /2.
(a) (b) (c)

APPENDIX E: ISOMETRIES AND OPTIMIZED GATES


FOR MNIST DATASET; χ = 8
This Appendix contains plots of the isometries of a χ = 8
MPS model for the MNIST example studied in Sec. V to-
gether with the compiled gate sequences obtained using the
FIG. 25. Optimization for site 5. (a) Isometry. (b) Optimized
methods of Sec. III.
gate. (c) Circuit from optimization.

023010-20
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

(a) (b) (c)


(a) (b)

FIG. 26. Optimization for site 6. (a) Isometry. (b) Optimized


gate. (c) Circuit from optimization. (c)
Ry (0.79) • Ry (−1.56) • Ry (−1.68) • Ry (0.00)

Ry (−1.58) • Ry (−1.71) Ry (−1.47) Ry (0.78) • Ry (0.79) Ry (−0.78)


(a) (b) (c)
Ry (−1.57) Ry (0.00) Ry (1.56)

(d)
Ry (0.98) • Ry (−1.58) • Ry (−2.25) •

Ry (−1.58) • Ry (−2.43) Ry (−1.00) Ry (0.78) • Ry (0.78) Ry (−0.78)

Ry (−1.57) Ry (1.58)

FIG. 27. Optimization for site 7. (a) Isometry. (b) Optimized FIG. 30. Optimization for site 10. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization. gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
(a) (b)

(c)
Ry (3.14) Ry (−0.81)
S0,1 (1.56, −1.58)
Ry (−0.80) Ry (−0.00)

(d) FIG. 31. Optimization for site 11. (a) Isometry. (b) Optimized
Ry (1.57) • Ry (−1.56) • Ry (0.76)
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
Ry (−0.80) Ry (−1.58) circuit from optimization.

FIG. 28. Optimization for site 8. (a) Isometry. (b) Optimized


gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

(a) (b)

(c)
FIG. 32. Optimization for site 12. (a) Isometry. (b) Optimized
Ry (−1.57) • Ry (−1.58) • Ry (0.00) gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
Ry (−1.57) Ry (0.01) • Ry (1.57) • Ry (3.14) circuit from optimization.
Ry (−1.56) Ry (−1.57) Ry (−0.00) Ry (0.01)

(d)
Ry (1.57) • Ry (−1.56) •

Ry (1.58) • Ry (1.57) • Ry (0.01)

Ry (−1.58) Ry (1.56)

FIG. 29. Optimization for site 9. (a) Isometry. (b) Optimized


gate. (c) Raw circuit from optimization. (d) Expanded and cleaned FIG. 33. Optimization for site 13. (a) Isometry. (b) Optimized
circuit from optimization. gate. (c) Circuit from optimization.

023010-21
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

(a) (b) (c)


(a) (b)

FIG. 34. Optimization for site 14. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization. (c)
Ry (−0.77) • Ry (0.01) • Ry (−0.77) • Ry (0.02) • Ry (0.77) • Ry (−0.01) • Ry (2.37)

Ry (−1.56) Ry (3.14) Ry (−0.01) Ry (−1.58)

Ry (−1.57) Ry (1.57)

Ry (−1.56) Ry (−0.01) Ry (1.57)

(d)
Ry (−0.78) • • Ry (−0.79) • • Ry (−0.78) • • Ry (2.36)

Ry (−1.56) Ry (−0.03) Ry (1.58)

Ry (−1.57) Ry (1.57)

Ry (−1.57) Ry (1.57)

FIG. 37. Optimization for site 17. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

(a) (b)

FIG. 35. Optimization for site 15. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned (c)
Ry (−0.79) • Ry (−0 78) • Ry (−0.00) • Ry (−0.78) • Ry (2.36)
circuit from optimization.
Ry (−1.57) . Ry (1.56)

Ry (−1.57) Ry (1.56)

Ry (−1.57) Ry (0.01) Ry (1.57)

(d)
Ry (−0.77) • Ry (−0.79) • • Ry (−0.78) • Ry (2.36)

Ry (1.58) Ry (−1.58)

Ry (1.58) Ry (−1.58)

Ry (−1.57) Ry (1.57)

FIG. 38. Optimization for site 18. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

FIG. 36. Optimization for site 16. (a) Isometry. (b) Optimized FIG. 39. Optimization for site 19. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization. circuit from optimization.

023010-22
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

(a) (b) (c)

FIG. 40. Optimization for site 20. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization.
FIG. 44. Optimization for site 24. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

FIG. 41. Optimization for site 21. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization. FIG. 45. Optimization for site 25. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

FIG. 42. Optimization for site 22. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

FIG. 46. Optimization for site 26. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

(a) (b) (c)

FIG. 43. Optimization for site 23. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned FIG. 47. Optimization for site 27. (a) Isometry. (b) Optimized
circuit from optimization. gate. (c) Circuit from optimization.

023010-23
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

(a) (b) (c)

FIG. 48. Optimization for site 28. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization.

FIG. 52. Optimization for site 32. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

FIG. 49. Optimization for site 29. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

FIG. 53. Optimization for site 33. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

FIG. 50. Optimization for site 30. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

FIG. 54. Optimization for site 34. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

FIG. 51. Optimization for site 31. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned FIG. 55. Optimization for site 35. (a) Isometry. (b) Optimized
circuit from optimization. gate. (c) Circuit from optimization.

023010-24
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

FIG. 56. Optimization for site 36. (a) Isometry. (b) Optimized FIG. 60. Optimization for site 40. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned gate. (c) Circuit from optimization.
circuit from optimization.

(a) (b) (c)

FIG. 57. Optimization for site 37. (a) Isometry. (b) Optimized FIG. 61. Optimization for site 41. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned gate. (c) Circuit from optimization.
circuit from optimization.

(a) (b) (c)

FIG. 62. Optimization for site 42. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization.

FIG. 58. Optimization for site 38. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

FIG. 59. Optimization for site 39. (a) Isometry. (b) Optimized FIG. 63. Optimization for site 43. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization. circuit from optimization.

023010-25
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

FIG. 67. Optimization for site 47. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization.

FIG. 64. Optimization for site 44. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

(a) (b)

FIG. 68. Optimization for site 48. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization.
(c)
Ry (−0.42) • Ry (−1.14) • Ry (−0.02)

Ry (−1.57) Ry (1.54) • Ry (−0.35) • Ry (−0.35)

Ry (0.00) Ry (−0.01) Ry (−1.56) Ry (1.58)

(d)
Ry (0.42) • Ry (−1.14) •

Ry (1.57) Ry (−1.61) • Ry (−0.35) • Ry (−0.35)

Ry (−1.55) Ry (1.56)

FIG. 65. Optimization for site 45. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

FIG. 66. Optimization for site 46. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.

023010-26
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

[1] R. S. Smith, M. J. Curtis, and W. J. Zeng, A practical quantum [15] W. Huggins, P. Patil, B. Mitchell, K. B. Whaley, and E. M.
instruction set architecture, arXiv:1608.03355 [quant-ph]. Stoudenmire, Towards quantum machine learning with tensor
[2] D. S. Steiger, T. Häner, and M. Troyer, Projectq: An open networks, Quantum Sci. Technol. 4, 024001 (2019).
source software framework for quantum computing, Quantum [16] I. Glasser, R. Sweke, N. Pancotti, J. Eisert, and J. I. Cirac,
2, 49 (2018). Expressive power of tensor-network factorizations for proba-
[3] T. Häner, D. S. Steiger, K. Svore, and M. Troyer, A software bilistic modeling, with applications from hidden markov models
methodology for compiling quantum programs, Quantum Sci. to quantum machine learning, Advances in Neural Information
Technol. 3, 020501 (2018). Processing Systems 32, Proceedings of the NeurIPS 2019 Con-
[4] G. Aleksandrowicz, T. Alexander, P. Barkoutsos, L. Bello, Y. ference (2019).
Ben-Haim, D. Bucher, F. J. Cabrera-Hernández, J. Carballo- [17] M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini, Parame-
Franquis, A. Chen, C.-F. Chen, J. M. Chow, A. D. Córcoles- terized quantum circuits as machine learning models, Quantum
Gonzales, A. J. Cross, A. Cross, J. Cruz-Benito, C. Culver, Sci. Technol. 4, 043001 (2019).
S. D. L. P. González, E. D. L. Torre, D. Ding, E. Dumitrescu, [18] V. Havlíček, A. D. Córcoles, K. Temme, A. W. Harrow, A.
I. Duran, P. Eendebak, M. Everitt, I. F. Sertage, A. Frisch, Kandala, J. M. Chow, and J. M. Gambetta, Supervised learning
A. Fuhrer, J. Gambetta, B. G. Gago, J. Gomez-Mosquera, D. with quantum-enhanced feature spaces, Nature (London) 567,
Greenberg, I. Hamamura, V. Havlicek, J. Hellmers, Ł. Herok, 209 (2019).
H. Horii, S. Hu, T. Imamichi, T. Itoko, A. Javadi-Abhari, N. [19] U. Schollwöck, The density-matrix renormalization group in
Kanazawa, A. Karazeev, K. Krsulich, P. Liu, Y. Luh, Y. Maeng, the age of matrix product states, Ann. Phys. 326, 96 (2011).
M. Marques, F. J. Martín-Fernández, D. T. McClure, D. McKay, [20] R. Orús, A practical introduction to tensor networks: Matrix
S. Meesala, A. Mezzacapo, N. Moll, D. M. Rodríguez, G. product states and projected entangled pair states, Ann. Phys.
Nannicini, P. Nation, P. Ollitrault, L. J. O’Riordan, H. Paik, 349, 117 (2014).
J. Pérez, A. Phan, M. Pistoia, V. Prutyanov, M. Reuter, J. [21] R. Orús, Tensor networks for complex quantum systems, Nat.
Rice, A. R. Davila, R. H. P. Rudy, M. Ryu, N. Sathaye, C. Rev. Phys. 1, 538 (2019).
Schnabel, E. Schoute, K. Setia, Y. Shi, A. Silva, Y. Siraichi, S. [22] E. Grant, M. Benedetti, S. Cao, A. Hallam, J. Lockhart, V.
Sivarajah, J. A. Smolin, M. Soeken, H. Takahashi, I. Tavernelli, Stojevic, A. G. Green, and S. Severini, Hierarchical quantum
C. Taylor, P. Taylour, K. Trabing, M. Treinish, W. Turner, D. classifiers, npj Quantum Inf. 4, 1 (2018).
Vogt-Lee, C. Vuillot, J. A. Wildstrom, J. Wilson, E. Winston, [23] Z.-L. Xiang, S. Ashhab, J. Q. You, and F. Nori, Hybrid quantum
C. Wood, S. Wood, S. Wörner, I. Y. Akhalwaya, and C. Zoufal, circuits: Superconducting circuits interacting with other quan-
Qiskit, https://qiskit.org (2020), [Online; accessed 17-February- tum systems, Rev. Mod. Phys. 85, 623 (2013).
2020]. [24] M. Schuld, A. Bocharov, K. M. Svore, and N. Wiebe, Circuit-
[5] R. LaRose, Overview and comparison of gate level quantum centric quantum classifiers, Phys. Rev. A 101, 032308 (2020).
software platforms, Quantum 3, 130 (2019). [25] Y. LeCun, C. Cortes, and C. J. Burges, Mnist handwritten digit
[6] Quantiki, List of QC Simulators, https://www.quantiki.org/ database, ATT Labs [Online]. Available: http://yann.lecun.com/
wiki/list-qc-simulators (2020), [Online; accessed 17-February- exdb/mnist 2 (2010).
2020]. [26] E. Stoudenmire and D. J. Schwab, Supervised learning with
[7] E. T. Campbell, B. M. Terhal, and C. Vuillot, Roads towards tensor networks, in Advances in Neural Information Processing
fault-tolerant universal quantum computation, Nature (London) Systems (Curran Associates, Inc., Red Hook, NY, 2016), pp.
549, 172 (2017). 4799–4807.
[8] P. W. Shor, Polynomial-time algorithms for prime factorization [27] E. Farhi and H. Neven, Classification with quantum neural
and discrete logarithms on a quantum computer, SIAM Rev. 41, networks on near term processors, arXiv:1802.06002.
303 (1999). [28] M. Schuld and N. Killoran, Quantum Machine Learning in
[9] C. Gidney and M. Ekerå, How to factor 2048 bit rsa integers in Feature Hilbert Spaces, Phys. Rev. Lett. 122, 040504 (2019).
8 hours using 20 million noisy qubits, arXiv:1905.09749. [29] S. Lloyd, M. Schuld, A. Ijaz, J. Izaac, and N. Killoran, Quantum
[10] A. D. Córcoles, A. Kandala, A. Javadi-Abhari, D. T. McClure, embeddings for machine learning, arXiv:2001.03622.
A. W. Cross, K. Temme, P. D. Nation, M. Steffen, and J. M. [30] M. Schuld, R. Sweke, and J. J. Meyer, Effect of data encoding
Gambetta, Challenges and opportunities of near-term quantum on the expressive power of variational quantum-machine-
computing systems, Proc. IEEE 108, 1338 (2020). learning models, Phys. Rev. A 103, 032430 (2021).
[11] J. Preskill, Quantum computing in the nisq era and beyond, [31] R. LaRose and B. Coyle, Robust data encodings for quantum
Quantum 2, 79 (2018). classifiers, Phys. Rev. A 102, 032420 (2020).
[12] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, [32] S. R. White, Density Matrix Formulation for Quantum Renor-
and S. Lloyd, Quantum machine learning, Nature (London) malization Groups, Phys. Rev. Lett. 69, 2863 (1992).
549, 195 (2017). [33] C. Schön, E. Solano, F. Verstraete, J. I. Cirac, and M. M. Wolf,
[13] A. Perdomo-Ortiz, M. Benedetti, J. Realpe-Gómez, and R. Sequential Generation of Entangled Multiqubit States, Phys.
Biswas, Opportunities and challenges for quantum-assisted ma- Rev. Lett. 95, 110503 (2005).
chine learning in near-term quantum computers, Quantum Sci. [34] C Schön, K Hammerer, M. M. Wolf, J. I. Cirac, and E Solano,
Technol. 3, 030502 (2018). Sequential generation of matrix-product states in cavity qed,
[14] C. Ciliberto, M. Herbster, A. D. Ialongo, M. Pontil, A. Phys. Rev. A 75, 032311 (2007).
Rocchetto, S. Severini, and L. Wossnig, Quantum machine [35] D Perez-Garcia, F. Verstraete, M. M. Wolf, and J. I. Cirac,
learning: A classical perspective, Proc. R. Soc. A 474, Matrix product state representations, Quantum Inf. Comput. 7,
20170551 (2018). 401 (2007).

023010-27
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

[36] A. Cichocki, Tensor networks for big data analytics and large- [59] Z.-Z. Sun, S.-J. Ran, and G. Su, Tangent-space gradient opti-
scale optimization problems, arXiv:1407.3124. mization of tensor network for machine learning, Phys. Rev. E
[37] A. Cichocki, A.-H. Phan, Q. Zhao, N. Lee, I. V. Oseledets, M. 102, 012152 (2020).
Sugiyama, and D. Mandic, Tensor networks for dimensionality [60] Y. Levine, O. Sharir, N. Cohen, and A. Shashua, Bridging
reduction and large-scale optimizations. part 2 applications and Many-Body Quantum Physics and Deep Learning Via Tensor
future perspectives, Found. Trends Mach. Learn. 9, 431 (2017). Networks, Phys. Rev. Lett. 122, 065301 (2019).
[38] I. V. Oseledets, Tensor-train decomposition, SIAM J. Sci. [61] J. Chen, S. Cheng, H. Xie, L. Wang, and T. Xiang, Equivalence
Comput. 33, 2295 (2011). of restricted boltzmann machines and tensor network states,
[39] E. M. Stoudenmire, Learning relevant features of data with Phys. Rev. B 97, 085104 (2018).
multi-scale tensor networks, Quantum Sci. Technol. 3, 034003 [62] I. Glasser, N. Pancotti, M. August, I. D. Rodriguez, and J. I.
(2018). Cirac, Neural-Network Quantum States, String-Bond States,
[40] C. Guo, Z. Jie, W. Lu, and D. Poletti, Matrix product operators and Chiral Topological States, Phys. Rev. X 8, 011006 (2018).
for sequence-to-sequence learning, Phys. Rev. E 98, 042114 [63] M. Foss-Feig, D. Hayes, J. M. Dreiling, C. Figgatt, J. P.
(2018). Gaebler, S. A. Moses, J. M. Pino, and A. C. Potter, Holographic
[41] J. Carrasquilla, G. Torlai, R. G. Melko, and L. Aolita, Recon- quantum algorithms for simulating correlated spin systems,
structing quantum states with generative models, Nat. Mach. arXiv:2005.03023.
Intell. 1, 155 (2019). [64] J. M. Pino, J. M. Dreiling, C Figgatt, J. P. Gaebler, S. A.
[42] G. Evenbly, Number-state preserving tensor networks as classi- Moses, C. H. Baldwin, M Foss-Feig, D Hayes, K Mayer, C
fiers for supervised learning, arXiv:1905.06352. Ryan-Anderson et al., Demonstration of the qccd trapped-ion
[43] S. Klus and P. Gelß, Tensor-based algorithms for image classi- quantum computer architecture, arXiv:2003.01293.
fication, Algorithms 12, 240 (2019). [65] B. Coyle, D. Mills, V. Danos, and E. Kashefi, The born
[44] S. Cheng, L. Wang, T. Xiang, and P. Zhang, Tree tensor net- supremacy: Quantum advantage and training of an ising born
works for generative modeling, Phys. Rev. B 99, 155131 (2019). machine, npj Quantum Inf. 6, 1 (2020).
[45] D. Liu, S.-J. Ran, P. Wittek, C. Peng, R. B. García, G. Su, and [66] A. JavadiAbhari, S. Patil, D. Kudrow, J. Heckey, A. Lvov,
M. Lewenstein, Machine learning by unitary tensor network of F. T. Chong, and M. Martonosi, Scaffcc: A framework for
hierarchical tree structure, New J. Phys. 21, 073059 (2019). compilation and analysis of quantum computing programs, in
[46] I. Glasser, N. Pancotti, and J. I. Cirac, From probabilistic Proceedings of the 11th ACM Conference on Computing Fron-
graphical models to generalized tensor networks for supervised tiers (Association for Computing Machinery, New York, NY,
learning, IEEE Access 8, 68169 (2020). 2014), pp. 1–10.
[47] M. Trenti, L. Sestini, A. Gianelle, D. Zuliani, T. Felser, [67] M. Amy and V. Gheorghiu, staq-a full-stack quantum process-
D. Lucchesi, and S. Montangero, Quantum-inspired machine ing toolkit, Quantum Sci. Technol. 5, 034016 (2020).
learning on high-energy physics data, arXiv:2004.13747. [68] A. JavadiAbhari, S. Patil, D. Kudrow, J. Heckey, A. Lvov, F. T.
[48] T.-D. Bradley, E. M. Stoudenmire, and J. Terilla, Modeling Chong, and M. Martonosi, Scaffcc: Scalable compilation and
sequences with quantum states: A look under the hood, Mach. analysis of quantum programs, Parallel Comput. 45, 2 (2015).
Learn.: Sci. Technol. 1, 035008 (2020). [69] M. G. Davis, E. Smith, A. Tudor, K. Sen, I. Siddiqi, and C.
[49] E. Gillman, D. C. Rose, and J. P. Garrahan, A tensor network ap- Iancu, Heuristics for quantum compiling with a continuous gate
proach to finite markov decision processes, arXiv:2002.05185. set, Presented at the 3rd International Workshop on Quantum
[50] J. Miller, G. Rabusseau, and J. Terilla, Tensor networks for Compilation as part of the International Conference On Com-
language modeling, arXiv:2003.01039. puter Aided Design 2019 (2019).
[51] R. Selvan and E. B. Dam, Tensor networks for medical image [70] M. Soeken, F. Mozafari, B. Schmitt, and G. De Micheli, Com-
classification, in Proceedings of the Third Conference on Medi- piling permutations for superconducting qpus, in 2019 Design,
cal Imaging with Deep Learning (2020). Automation & Test in Europe Conference & Exhibition (DATE)
[52] J. Wang, C. Roberts, G. Vidal, and S. Leichenauer, Anomaly (IEEE, New York, NY, 2019) pp. 1349–1354.
detection with tensor networks, arXiv:2006.02516. [71] R. Iten, R. Colbeck, I. Kukuljan, J. Home, and M. Christandl,
[53] J. Reyes and M. Stoudenmire, A multi-scale tensor network ar- Quantum circuits for isometries, Phys. Rev. A 93, 032318
chitecture for classification and regression, arXiv:2001.08286. (2016).
[54] M. Lubasch, J. Joo, P. Moinier, M. Kiffner, and D. Jaksch, [72] F. Vatan and C. Williams, Optimal quantum circuits for general
Variational quantum algorithms for nonlinear problems, Phys. two-qubit gates, Phys. Rev. A 69, 032315 (2004).
Rev. A 101, 010301(R) (2020). [73] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T.
[55] A. S. Kardashin, A. V. Uvarov, and J. D. Biamonte, Quantum Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser,
machine learning tensor network states, Front. Phys. 8, 644 J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. Jarrod
(2021). Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern,
[56] A. V. Uvarov, A. S. Kardashin, and J. D. Biamonte, Machine E. Larson, C. J. Carey, İ. Polat, Yu. Feng, E. W. Moore, J.
learning phase transitions with a quantum processor, Phys. Rev. Vand erPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen,
A 102, 012415 (2020). E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro,
[57] Z.-Y. Han, J. Wang, H. Fan, L. Wang, and P. Zhang, Unsuper- F. Pedregosa, P. van Mulbregt, and SciPy 1. 0 Contributors,
vised Generative Modeling Using Matrix Product States, Phys. SciPy 1.0: Fundamental Algorithms for Scientific Computing
Rev. X 8, 031012 (2018). in Python, Nat. Methods 17, 261 (2020).
[58] S. Efthymiou, J. Hidary, and S. Leichenauer, Tensornetwork for [74] B. Efron, The Jackknife, the Bootstrap and other Resampling
machine learning, arXiv:1906.06329. Plans (SIAM, Philadelphia, PA, 1982).

023010-28
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)

[75] S. Bravyi, S. Sheldon, A. Kandala, D. C. Mckay, and J. M. [81] B. Foxen, C. Neill, A. Dunsworth, P. Roushan, B. Chiaro, A.
Gambetta, Mitigating measurement errors in multi-qubit exper- Megrant, J. Kelly, Z. Chen, K. Satzinger, R. Barends et al.,
iments, arXiv:2006.14044. Demonstrating a Continuous Set of Two-Qubit Gates for Near-
[76] B. Nachman, M. Urbanek, W. A. de Jong, and C. W. Bauer, Term Quantum Algorithms, Phys. Rev. Lett. 125, 120504
Unfolding quantum computer readout noise, npj Quantum Inf. (2020).
6, 84 (2020). [82] N. Schuch and J. Siewert, Natural two-qubit gate for quantum
[77] J. Martyn, G. Vidal, C. Roberts, and S. Leichenauer, Entangle- computation using the xy interaction, Phys. Rev. A 67, 032301
ment and tensor networks for supervised image classification, (2003).
arXiv:2007.06082. [83] M. Kjaergaard, M. E. Schwartz, J. Braumüller, P.
[78] Y.-Y. Shi, L.-M. Duan, and G. Vidal, Classical simulation of Krantz, J. I.-J. Wang, S. Gustavsson, and W. D.
quantum many-body systems with a tree tensor network, Phys. Oliver, Superconducting qubits: Current state of
Rev. A 74, 022320 (2006). play, Annu. Rev. Condens. Matter Phys. 11, 369
[79] L. Cincio, K. Rudinger, M. Sarovar, and P. J. Coles, Ma- (2020).
chine learning of noise-resilient quantum circuits, Phys. Rev. [84] A. Sørensen and K. Mølmer, Entanglement and quantum com-
X Quantum 2, 010324 (2021). putation with ions in thermal motion, Phys. Rev. A 62, 022311
[80] I. D. Kivlichan, J. McClean, N. Wiebe, C. Gidney, A. Aspuru- (2000).
Guzik, G. K.-L. Chan, and R. Babbush, Quantum Simulation of [85] T. Tanamoto, Y.-x. Liu, X. Hu, and F. Nori, Efficient Quantum
Electronic Structure with Linear Depth and Connectivity, Phys. Circuits for One-Way Quantum Computing, Phys. Rev. Lett.
Rev. Lett. 120, 110501 (2018). 102, 100501 (2009).

023010-29

You might also like