0% found this document useful (0 votes)
17 views13 pages

A Quantum-Classical Collaborative Training Architecture Based On Quantum State Fidelity

Uploaded by

jaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views13 pages

A Quantum-Classical Collaborative Training Architecture Based On Quantum State Fidelity

Uploaded by

jaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

This article has been accepted for publication in IEEE Transactions on Quantum Engineering.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

Digital Object Identifier

A Quantum-Classical Collaborative
Training Architecture Based on Quantum
State Fidelity
RYAN L’ABBATE1 , ANTHONY D’ONOFRIO JR.1 , SAMUEL STEIN2 , SAMUEL YEN-CHI
CHEN3 , ANG LI2 , PIN-YU CHEN4 , JUNTAO CHEN1 , YING MAO1
1
Computer and Information Science Department, Fordham University, New York, USA. E-mail: {rlabbate, adonofrio10, jchen504, ymao41}@fordham.edu
2
Pacific Northwest National Laboratory, Richland, WA, USA. Email: {samuel.stein, ang.li}@pnnl.gov
3
Brookhaven National Laboratory, Upton, NY, USA. Email:{[email protected]}
4
IBM Research, Yorktown Heights, NY, USA. Email:{[email protected]}
Corresponding author: Ying Mao (e-mail: [email protected]).
This research was supported in part by the National Science Foundation (NSF) under grant agreements 2329020, 2335788, and 2301884. It
was also supported in part by the U.S. Department of Energy (DOE) through the Office of Advanced Scientific Computing Research’s
“Advanced Memory to Support Artificial Intelligence for Science”. PNNL is operated by Battelle for the DOE under Contract
DE-AC05-76RL01830.

ABSTRACT Recent advancements have highlighted the limitations of current quantum systems, particu-
larly the restricted number of qubits available on near-term quantum devices. This constraint greatly inhibits
the range of applications that can leverage quantum computers. Moreover, as the available qubits increase,
the computational complexity grows exponentially, posing additional challenges. Consequently, there is
an urgent need to use qubits efficiently and mitigate both present limitations and future complexities.
To address this, existing quantum applications attempt to integrate classical and quantum systems in a
hybrid framework. In this study, we concentrate on quantum deep learning and introduce a collaborative
classical-quantum architecture called co-TenQu. The classical component employs a tensor network for
compression and feature extraction, enabling higher-dimensional data to be encoded onto logical quantum
circuits with limited qubits. On the quantum side, we propose a quantum-state-fidelity-based evaluation
function to iteratively train the network through a feedback loop between the two sides. co-TenQu has
been implemented and evaluated with both simulators and the IBM-Q platform. Compared to state-of-
the-art approaches, co-TenQu enhances a classical deep neural network by up to 41.72% in a fair setting.
Additionally, it outperforms other quantum-based methods by up to 1.9 times and achieves similar accuracy
while utilizing 70.59% fewer qubits.

INDEX TERMS Quantum Deep Learning, Quantum-Classical Hybrid Systems, Collaborative Training

I. INTRODUCTION the search for alternative computing approaches capable of


managing the ever-growing computational needs.
Recent years have witnessed significant progress in machine
learning and deep learning. Groundbreaking models and Quantum computing provides considerable potential in de-
algorithms have significantly enhanced our capabilities to livering the increased computational power essential to meet
identify patterns and process data in areas such as computer the expanding demands of deep learning. Classical comput-
vision, natural language processing, and finance. However, ers employ binary bits, representing either 0 or 1, which con-
this accelerated development has led to an exponential in- stitute the current computing standard. In contrast, quantum
crease in the computational power needed to execute in- computers use quantum bits (or qubits), which are probabilis-
creasingly sophisticated deep learning tasks. As the era of tic combinations of 0 and 1, achieved through quantum super-
Moore’s Law comes to a close, however, the acceleration of position and entanglement. As a result, the expected value
computational demand is starting to surpass the growth in of a qubit measurement can represent any number between
available computing power [1]. Consequently, this trend fuels 0 and 1. Therefore, a specific number of qubits can exhibit

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

L’Abbate et al.: A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity

substantially greater representational power compared to an sical tensor network (TN) into the feature extraction stage
equivalent number of classical bits. In 1998, the first quantum to facilitate dimensionality reduction. Specifically, the TN
computer capable of executing computations was developed serves as a trainable module designed to capture high-level
[2]. The IBM-Q Experience was introduced in 2016, granting abstractions of the input data, the output of which is sub-
developers access to state-of-the-art quantum resources [3]. sequently fed into a variational quantum circuit (VQC) for
In 2020, Google AI demonstrated that a 53-qubit quantum classification purposes. Furthermore, we employ a quantum-
computer could complete a task in 200 seconds that would state-fidelity based cost function to train the model directly
require a classical computer more than 10,000 years. This on qubits’ states. Our proposed solution presents signifi-
advantage of quantum computing over classical computing is cant advantages over existing techniques, such as Principal
frequently referred to as "quantum supremacy" [4]. Component Analysis (PCA), which lacks trainability, and
Researchers inspired by the concept of quantum conventional neural networks that require a considerable
supremacy are actively exploring methods to convert clas- number of parameters to be optimized or pre-trained. The
sical algorithms into their quantum versions, aiming to integration of our hybrid system enables more efficient data
achieve significant reductions in time complexity compared encoding, thereby enhancing the overall performance of the
to classical counterparts. Quantum speed-ups have already quantum machine learning pipeline. The main contributions
been demonstrated for Shor’s algorithm [5] , which addresses are summarized as follows.
prime factorization and discrete logarithms, and Grover’s
• We propose co-TenQu, a quantum-classical collabo-
algorithm, which tackles database searches [6]. Quantum
rative training architecture. On the classical part, it
computing can be applied to machine learning tasks by
employs tensor network layers for data pre-processing
employing variational quantum circuits—quantum circuits
and preparation. In the quantum part, it utilizes a pre-
with trainable parameters. Specific areas within classical
processed dataset to build circuits with fewer qubits to
learning, such as Deep Learning and Support Vector Ma-
reduce the overall qubit requirement and noise interfer-
chines, could potentially benefit from quantum computing
ence.
[7], [8]. Quantum speed-ups have been achieved for several
• We introduce a quantum state fidelity based cost func-
algorithms, including expectation maximization solving [9]
tion. Instead of converting back to classical states,
(where the algorithm’s speed has been increased to sub-
co-TenQu train the model directly on quantum states
linear time [10]), Support Vector Machines [11], and natural
aiming at accelerating the training process and improv-
language processing [12].
ing performance.
However, in the noisy intermediate-scale quantum (NISQ)
• We implement co-TenQu with popular quantum toolk-
era, the qubits are both limited in number and subject to
its, e.g., Qiskit and PennyLane, and compare it with
noise. For instance, IBM-Q provides only 5-7 qubit machines
state-of-the-art solutions in the literature, by up to 1.9x
to the public. Furthermore, as the qubit count increases, the
and 70.59% less quantum resources. Additionally, we
computational complexity of the system grows exponentially
conduct proof-of-concept experiments on 14 different
[13], which leads to a higher overall noise level in a quan-
IBM-Q quantum machines.
tum machine. In the context of deep learning, an increased
number of qubits may employ a greater number of gates,
potentially augmenting circuit depth and noise interference. II. RELATED WORK
Consequently, it is crucial to efficiently and reliably utilize Recent developments [19]–[23] in quantum computing
the representational power of qubits through effective en- show great potential to enhance current learning algorithms
coding, making quantum algorithms more feasible on both through utilization of the qubit, the unit of quantum infor-
current and NISQ quantum computers, while mitigating the mation. In this field, quantum neural networks (QNN) have
surge in computational complexity as the number of qubits emerged as a promising research area in quantum machine
increases. A potential solution to data encoding challenges learning [24], [25]. Due to the limited quantum resources
involves performing classical pre-processing of the data for available, most of the existing works focused on numerical
compression and/or feature extraction. One prevalent method analysis or datasets with lower dimensionalities [17], [26],
for dimension reduction is Principal Component Analysis [27], such as MNIST [28].
(PCA), as demonstrated in prior works [14]–[18]. However, Farhi et al. [29] introduced a QNN for binary classifica-
PCA may not possess the representational power necessary to tion, which utilizes quantum entanglement to enhance the
compress data accurately. More sophisticated methods, such model’s computational power. In addition, quantum circuit
as employing neural network layers, demand substantial pre- learning [30], [31] developed a quantum-classical hybrid
training and significantly increase the number of parameters algorithm. They employed an iterative optimization of the
requiring tuning. Therefore, there is a pressing need for parameters to circumvent the high-depth circuit. Moreover,
efficient data compression techniques tailored to quantum Stokes et al. [32] presented a novel method for gradient
machine learning. descent using quantum circuits, enabling the optimization
In this work, we introduce a novel classical-quantum col- of variational quantum circuits in a manner analogous to
laborative training architecture, which incorporates a clas- classical neural networks. However, these solutions focused
2 VOLUME 4, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

L’Abbate et al.: A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity

on theoritical analysis and only numerical experiments were distribution for the circuit results. Calculations are performed
provided. by manipulating the probability distributions of qubits. 0 and
In NISQ era, QCNN [33] suggests a design for a quantum 1 can be represented in vector notation as seen in Equation 1.
convolutional neural network that uses O(log(N )) trainable Quantum systems are often described using ⟨bra| |ket⟩
parameters for N dimensional inputs and can be realized on notation, where ⟨bra| and |ket⟩ represent horizontal and ver-
near-term quantum computers. Additionally, QuCNN [34] tical quantum state vectors, respectively. Because a qubit is a
employs an entanglement based backpropagation for NISQ mixture of 0 and 1, qubit states are described mathematically
machines. Jiang et al. [35] proposed a co-design framework as a linear combination of |0⟩ and |1⟩ as seen in Equation 1
named QuantumFlow, which features quantum-friendly neu- and 2.
ral networks, a mapping tool to generate quantum circuits,      
and an execution engine. However, QuantumFlow requires 1 0 α
|0⟩ = , |1⟩ = , |Ψ⟩ = (1)
local training of the network prior to mapping to quantum cir- 0 1 β
cuits, which leads to sensitivity to noise when implemented
on real quantum computers as opposed to simulations. |Ψ⟩ = α|0⟩ + β|1⟩ (2)
Expanding upon the use of quantum operations to perform
distance measurements, Stein et. al proposed the QuClassi This linear combination of qubit states is referred to as a
system: a hybrid quantum-classical system with a quantum- qubit’s statevector. |0⟩ and |1⟩ are orthonormal vectors in an
state-fidelity based loss function [14], [15]. QuClassi was eigenspace. In Equation 2, |Ψ⟩ represents the qubit state, a
able to provide improvements in accuracy compared to other probabilistic combination of |0⟩ and |1⟩.
contemporary quantum-based solutions such as TensorFlow The tensor product of qubit states can be used to describe
Quantum [36] and QuantumFlow. The QuClassi system the quantum states of multiple qubits. The tensor product
demonstrated success in both binary and multi-class clas- between the qubits shown in Equations 2 and 3 can be
sification. It used Principal Component Analysis (PCA) to described using Equation 4.
compress dataset classically. However, PCA fails to fully
utilize the classical resources by providing trainable layers. |Φ⟩ = γ|0⟩ + ω|1⟩ (3)
TN-VQC [37] proposed the use of tensor networks for feature
extraction and data compression to achieve higher classi-
fication accuracy for variational quantum circuits. Tensor |ΨΦ⟩ = |Ψ⟩ ⊗ |Φ⟩ = γα|00⟩ + ωα|01⟩ + γβ|10⟩ + ωβ|11⟩
networks do provide the advantage of having fewer pa- (4)
rameters compared to neural networks while still providing |0⟩ and |1⟩ represent opposite points of the sphere on
some trainability unlike PCA. TN-VQC employed a circuit the z axis. Measurements of qubit states can be taken with
architecture involving CNOT gates rather than CSWAP gates respect to any basis, but convention typically dictates that
like QuClassi. measurements are taken against the z-axis. However, the x-
This paper proposes co-TenQu, a hybrid quantum-classical axis, y-axis, or any pair of opposite points on the sphere could
architecture for deep neural networks. Comparing with exist- potentially be used as a basis of measurement. Quantum
ing literature, it utilizes a quantum-state fidelity based cost states are responsible for encoding data, and to perform op-
function to train the quantum section directly on qubits’ erations on quantum states quantum gates are used. Quantum
states. Additionally, tensor networks are employed to fully gates apply a transformation over a quantum state into some
exploit classical resources to compensate for the limitations new quantum state.
(e.g., low qubit count and noises) of quantum resources.
Through a collaborative training process, co-TenQu is able B. QUANTUM GATES
to outperform state-of-the-arts. Similar to classical data which is manipulated and encoded
using gates, quantum data is manipulated and encoded using
III. BACKGROUND quantum gates. Quantum gates can either perform a rotation
In this section, we present the background that is necessary about an axis or perform an operation on a qubit based on the
for designing our solution. value of another qubit. These are referred to as rotation gates
and controlled gates respectively.
A. QUANTUM COMPUTING BASICS
1) A Qubit and its superposition 1) Single-Qubit Gates
Classical computing uses bits that are binary in nature and A common type of single-qubit operations are the rotation
measure either 0 or 1. Quantum computing uses quantum gates. These gates perform qubit rotations by parameterized
bits or qubits. Qubits, unlike classical bits, are a probabilistic amounts. The generalized single-rotation gate R is shown in
mixture of 0 and 1. This mixture of 0 and 1 is known as a matrix form in Equation 5.
superposition. Upon measurement, the qubit in superposition
will collapse to either a value of 0 or 1. Quantum circuits are cos θ2 −ie−iϕ sin θ2
 
often run many times, using the results to get a probability R(θ, ϕ) = (5)
−ie−iϕ sin θ2 cos θ2
VOLUME 4, 2023 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

L’Abbate et al.: A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity

Three commonly-used special cases of this gate are the 4) Controlled Gates
RX , RY , and RZ gates. These gates represent rotations in There are also two-qubit gates which utilize a control qubit
the x, y, and z plane and are expressed in Equations 6, 7, and and a target qubit. These gates, known as controlled gates,
8. RX and RY can be thought of as special cases of the R perform an operation on a target qubit depending on the value
gate in which ϕ = 0 and ϕ = π2 respectively. Therefore, of the control qubit.
RX (θ) is a rotation about the x-axis by angle θ and RY (θ) is CNOT Gate The CNOT gate is an example of a two-qubit
a rotation about the y-axis by angle θ. The derivation of RZ gate used in quantum computing. The CNOT gate flips the
from the general rotation gate is less straightforward and thus value of the target qubit if the control qubit is measured as
is not included here. 1 and does nothing otherwise. The CNOT gate can be seen
represented in matrix form below.
cos θ2 −i sin θ2
 
RX (θ) = = R(θ, 0) (6)
−i sin θ2 cos θ2
 
1 0 0 0
 0 0 0 1 
CN OT =  (13)
cos θ2 − sin θ2
  
π  0 0 1 0 
RY (θ) = = R(θ, ) (7)
sin θ2 cos θ2 2 0 1 0 0
" −iθ
# Fig.1 depicts the circuit notation for the CNOT gate. q0 is
e 2 0 the control qubit and q1 is the target qubit.
RZ (θ) = −iθ (8)
0 e 2

2) Hadamard Gate
A fundamental gate of quantum computation is the
Hadamard gate. It is a single-qubit gate puts a qubit into
superposition as described in Section III-A1. It can be ex- FIGURE 1: CNOT Gate Circuit Notation
pressed in matrix shown in equation 9. The √12 coefficient
is due to the fact that the sum of the squares of the state Controlled Rotation Gates Equations 14, 15 and 16 are con-
amplitudes must add to 1, so each state has a probability of 12 trolled rotation gates in matrix notation. Controlled rotation
and an amplitude of √12 . gates are similar to the CNOT gate but apply a rotation when
1
  the control qubit measures 1 instead of flipping the state. This
1 1
H=√ (9) allows for variable levels of entanglement between qubits.
2 1 −1
 
1 0 0 0
3) Two-Qubit Gates  0 1 0 0 
There are also operations that function as two-qubit rotations CRX (θ) =   0 0 cos θ θ 
 (14)
2 − sin 2
which perform an equal rotation on two qubits. These gates θ θ
0 0 − sin 2 cos 2
are described in Equations 10, 11, and 12. Note that these
gates are expressed as 4x4 matrices while the single-qubit 
1 0 0 0

gates were 2x2 matrices. This is because for a two-qubit  0 1 0 0 
gate, each individual qubit has two possible measurements, CRY (θ) =   0 0 cos θ − sin θ 
 (15)
2 2
yielding four possible results (|00⟩,|01⟩,|10⟩,|11⟩) rather than 0 0 sin θ2 cos θ2
two as seen previously for the single-qubit gates.
 
1 0 0 0
cos θ2 −i sin θ2
 
0 0  0 1 0 0 
0 cos θ2 −i sin θ2 0 CRZ (θ) = 
 0 0 iθ  (16)

RXX (θ) = 

 e2 0 
 0 −i sin θ2 cos θ2 0  iθ
0 0 0 e2
−i sin θ2 0 0 θ
cos 2
(10)
C. CONTROLLED SWAP GATE
Another type of controlled gate is the controlled SWAP
cos θ2 i sin θ2
 
0 0
 0 cos θ2 −i sin θ2 0  gate. The SWAP gate measures the difference between two
RY Y (θ) =   (11) quantum states and outputs the result to an ancilla qubit.
 0 −i sin θ2 cos θ2 0 
i sin θ2 0 0 θ
cos 2 Therefore, this gate is a three-qubit gate. The SWAP test
output values range from 0.5 to 1. Maximally different (or-
 θ 
e−i 2 0 0 0 thogonal) states will measure 1 with 50% probability while
θ

0 e−i 2 0 0
 identical states will measure 1 with 100% probability. The
RZZ (θ) =  (12)
 
−i θ SWAP test gate can be used to measure quantum state fidelity.

 0 0 e 2 0 
0 0 0 −i θ
e 2 The controlled swap gate is described in Equations 17 and 18.
4 VOLUME 4, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

L’Abbate et al.: A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity

dimension. The data is then converted from classical data


into quantum data through a quantum data encoding method,
CSW AP (q0 , q1 , q2 ) = |0⟩⟨0| ⊗ I ⊗ I + |1⟩⟨1| ⊗ SW AP
as outlined in Section IV-A. This results in a quantum data
(17)
set represented by quantum state preparation parameters. For
 1 each predictable class in the data set, a quantum state is
0 0 0 0 0 0 0 
initialized with the same qubit count as the number of qubits
 0 1 0 0 0 0 0 0 
 0
 0 1 0 0 0 0 0 
 in the classical quantum data set, due to the constraints of the
 0 0 0 0 0 1 0 0  SWAP test. The quantum states, along with quantum classical
CSW AP (q0 , q1 , q2 ) = 
 0

 0 0 0 1 0 0 0 
 data, are then used to generate a logical quantum circuit and
 0 0 0 1 0 0 0 0 
sent to a quantum computer for further processing.
0 0 0 0 0 0 1 0
 
0 0 0 0 0 0 0 1 This initialization of state is the core architecture to
(18) co-TenQu. In this, a quantum circuit of a certain number of
Figure 2 depicts a swap test being performed. The ancilla layers representing a quantum deep neural network (detailed
qubit, q0 , is placed is superposition using a Hadamard gate. in Section IV-B) is prepared with randomly initialized param-
Then a swap test is performed between q1 and q2 and mea- eters containing a certain number of qubits. The produced
sured onto q0 . Another Hadamard gate is performed on the quantum state of this circuit is to be SWAP tested against
ancilla qubit. Finally, the ancilla qubit is then measured onto the quantum data point, which is fed back to the classical
a classical bit to obtain the result. computer and analyzed with quantum state fidelity based cost
function (described in Section IV-D), forming the overall
collaborative quantum-classical deep learning architecture of
co-TenQu.
The quantum computer calculates the quantum fidelity
from one ancilla qubit which is used to calculate model loss,
and sends this metric back to the classical computer. The
FIGURE 2: Swap Test Quantum Circuit classical computer uses this information to update the learn-
able parameters in attempts to minimize the cost function.
One advantage of the CSWAP gate is that it only re- This procedure of loading quantum states, measuring state
quires the measurement of the ancilla qubit. When qubits fidelity, updating states to minimize cost is iterated upon
are measured directly, their states collapse and the super- until the desired convergence or sufficient epochs have been
position is lost. The Swap test allows the superposition of completed.
the other qubits to be maintained by measuring the quantum
state fidelity through the ancilla qubit instead of measuring A. DATA ENCODING ON QUBITS
the qubits directly. Therefore, minimal information is lost When evaluating quantum machine learning architectures on
through measurement. classical datasets, it is crucial to have a method for translating
classical data into quantum states. One question that arises
D. QUANTUM ENTANGLEMENT is how to represent a classical dataset in a quantum setting.
A key principle of quantum computing is quantum entangle- Our architecture utilizes the expectation of a qubit to trans-
ment. A qubit’s state is said to be entangled when its mea- late traditional numerical data points. To achieve this, data
surement is dependent on the measurement of another qubit. x1 , x2 , ..., xn of dimension d can be mapped onto a quantum
This dependence allows information to be transferred be- setting by normalizing each dimension di to fall within the
tween qubits, even if they are not physically close together (a range of 0 to 1. This is because a qubit’s expectation can
phenomena sometimes referred to as "action at a distance"). only take on values within this range. In contrast to classical
When one entangled qubit is measured, the other entangled computing, which requires a string of bits to represent the
qubit’s state also collapses. For example, if two qubits are same number, encoding a single dimension data point only
entangled using the CNOT gate, after the state of one qubit requires one qubit. To translate the traditional value xi into
is measured, the state of the second entangled qubit can be some quantum state, we perform a rotation around the Y axis
predicted with absolute certainty. Quantum entanglement is parameterized by the following equation:
a key component of the quantum advantage over classical √
RY (θxi ) = 2sin−1 ( xi ) (19)
computing, as it is a property of quantum computing with
no classical equivalent. The RY (θxi ) operation results in the expectation of a qubit
being measured against the Z axis, corresponding to the xi
IV. SYSTEM DESIGN value from the classical data that the qubit encodes. Building
Our architecture employs a feedback loop between classical upon this concept, we can encode the second dimension of
computers and quantum computers, as illustrated in Fig. 3. data across the X-Y plane. To achieve this, we employ two
Initially, the is fed into tensor networks with a layers of parameterized rotations on one qubit initialized in state |0⟩
trainable parameters and output the data in a pre-configured to prepare classical data in the quantum setting. To encode a
VOLUME 4, 2023 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

L’Abbate et al.: A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity

FIGURE 3: co-TenQu: A Quantum-Classical Collaborative Training Architecture

data point, we apply the necessary rotations across d2 qubits, applied to the remaining n/2 qubits. Additionally, there is
with each rotation parameterized by the normalized value of one ancilla qubit used for swap test measurements.
that data point’s corresponding dimension. It is worth noting
that the encoding of 2-dimensional data onto a single qubit B. QUANTUM LAYERS
may pose challenges for extreme values of x. However, we Similar to classical artificial neural networks, quantum cir-
explore the dual dimensional encoding as a possible method cuits can also be thought of as having layers. For a quantum
of reducing high qubit counts and evaluate the performance circuit, these layers would be comprised of quantum gates.
when each dimension of data is encoded into one respective In co-TenQu, we define three quantum layer styles: single-
qubit solely through a RY Gate. This approach is validated qubit unitary, dual-qubit unitary, and controlled-qubit unitary.
by the fact that we never measure any of our qubits, but only Each of these layer styles comprises rotations that serve as
their quantum fidelity through the SWAP test. As a result, the trainable parameters in our quantum machine learning
we can bypass the superposition-collapsing issue inherent in model. Defining these three types of layers enables system
this approach.We encode the second dimension of data on the design at a higher level than individual gates.
same qubit through the following rotation: Single-Qubit Unitary A single-qubit unitary layer involves
√ single-qubit rotations around the y-axis and z-axis (RY and
RZ(θxi+1 ) = 2sin−1 ( xi ) (20) RZ ). This allows for total manipulation of a qubit’s quantum
When dealing with a limited number of qubits, methods state. A single-qubit unitary layer is depicted in Figure 4.
that can reduce the number required are highly valuable.
Unlike classical computers, which utilize formats such as
integers and floats, classical data encoding in quantum states
does not have a tried and tested method. Therefore, our
approach may be subject to criticism. Nevertheless, our ap- FIGURE 4: Single Qubit Unitary
proach has been tested and proven to be a viable solution
to the problem at hand. Additionally, having knowledge of Dual-Qubit Unitary A dual-qubit untary layer involves
both the qubit’s expectation across the Y and Z domains dual-qubit rotations around the y and z axis (RY Y and RZZ ).
enables the reconstruction of classical data. Various methods The same y rotation and z rotation are applied to both qubits
for classical-to-quantum data encoding exist, ranging from involved. A dual-qubit unitary layer is depicted in Figure 5.
encoding 2n classical data points across n qubits using state-
vector encoding to encoding classical data into a binary
|0⟩ |ψ⟩ |0⟩ |ψ⟩
representation on quantum states by translating a vector of bi- RYY(θ) RZZ(θ) CRY(θ) CRZ(θ)
nary values onto qubits. The former method is highly suscep- |0⟩ |φ⟩ |0⟩ |φ⟩

tible to noise, whereas the latter loses significant information


in the process but is less susceptible to noise and exponential- FIGURE 5: Dual Qubit FIGURE 6: Entanglement
sampling problems. Exponential data-encoding methods also
exist and can be integrated into co-TenQu since it does not Entanglement-based Unitary A controlled-qubit unitary
directly perform quantum state tomography, making the data utilizes controlled rotation gates (CRY and CRZ ) to en-
encoding section scalable. tangle qubits.The use of these gates allows the level of
The co-TenQu quantum circuits consist of n + 1 qubits, entanglement between qubits to be trainable. In Figure 6, the
with n representing the dimension of the input data. The input top row is the control qubit and the bottom row is the target
data is encoded on n/2 qubits, while trainable parameters are qubit.
6 VOLUME 4, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

L’Abbate et al.: A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity

Where θd is a collection of parameters defining a circuit, x


is the data set, ϕx(i) is the quantum state representation of
data point i, and ω is the state being trained to minimize the
function in Equation 22 and 23.
Optimization of the parameters θd requires us to perform
gradient descent on our cost function. We make use of the
FIGURE 7: co-TenQu with 3-layers and 5-qubits setting following modified parameterized quantum gate differentia-
tion formula outlined in Equation 24.
δCost 1 π π
The layers can be combine linearly to composite a multi- = (f (θi + √ ) − f (θi − √ )) (24)
δθi 2 2 ϵ 2 ϵ
layer mode. For example, as seen in Figure 7, the circuit
features three layer types: single-qubit unitary, dual-qubit, Where in Equation 24 θi is a parameter, Cost is the cost
unitary, and controlled-qubit unitary. function, and ϵ is the epoch number of training the circuit.
Our addition of the ϵ is targeted at allowing for a change
C. PARAMETER SHIFT in search-breadth of the cost landscape, shrinking constantly
Backpropagation is a necessary step for training any deep ensuring a local-minima is found.
neural network. Gradients for the parameters of quantum The gradients of quantum parameters can also be deter-
circuits cannot be calculated by the same methods used in mined using numerical methods. Equation 25 show a formula
classical backpropagation. Therefore, the gradients of the to numerically determine the gradients of quantum param-
parameters are calculated using parameter shift shown in eters. However, numerical methods can run into issues due
Equation 21. to the noise an error associated with current quantum com-
puters. Therefore, the gradients calculated may be inaccurate
∇θ f (θ) = 0.5 ∗ [f (θ + s) − f (θ − s)] (21) and lead to inefficiency in training [38].

With the parameter shift rule, the quantum circuit can be f (θ + s) − f (θ − s)


∇θ f (θ) = (25)
viewed as a black box and the gradient is calculated by 2s
obtaining circuit results when the parameter is increased or
E. HYBRID TENSOR NETWORK AND QUANTUM
decreased by a shift s. The difference in results can be used
CIRCUIT DESIGN
to obtain a gradient for the parameter.
A hybrid model with a Tensor Network and a quantum circuit
is used to classify 28x28 MNIST images. The Tensor Net-
D. STATE FIDELITY BASED COST FUNCTION
work functions as a trainable feature extractor to compress
When training a neural network to accomplish a task, an
the 784-dimensional data into 4 dimensions for classification
explicit description of system improvement goal needs to
by the quantum circuit.
be established - i.e the cost function. The quantum machine
There are several different types of tensor networks. For
learning cost function landscape can be slightly ambiguous
this study, the Matrix Product State (MPS) will be employed.
compared to classical machine learning, as we could be
The MPS, also referred to as a tensor train, is the simplest
manipulating the expected values of each qubit in some
type of tensor network. In a MPS, tensors are contracted
way. However, even this is ambiguous - the direction being
through virtual indices. The number of these indices is re-
measured in heavily affects the expectation value and or
ferred to as a bond dimension, denoted by χ. A greater bond
what our iteration count would be for measuring expectation,
dimension indicates a greater amount of quantum entangle-
with lower iterations leading to increasingly noisy outputs.
ment that can be represented and therefore more representa-
Within our system, we make use of the SWAP test to parse
tional power in the MPS. An N-dimensional input is mapped
quantum state fidelity to an appropriate cost function. One of
into a product state using the mapping shown in Equation 26.
the benefits of the SWAP test is that we only need to measure
This mapping for the MPS input is known as a feature map.
one ancilla qubit. In the case of binary classification, each
data point is represented in a quantum state represented by  π
  π
  π

cos x
2 1
cos x
2 2
cos x
2 N
|ϕ⟩, which is used to train the quantum state prepared by our x → |Ψ⟩ = π ⊗ π ⊗ ... ⊗ π
sin x
2 1
sin x
2 2
sin x
2 N
DL model |ω⟩ such that the state of |ω⟩ minimizes some cost (26)
function. The classical cross-entropy cost function outlined The MPS takes an input of size 784 (28 times 28) and
in Equation 23 is an appropriate measure for state fidelity, as outputs a n-length tensor. The output dimension of the MPS
we want the fidelity returned to be maximized in the case of is a hyperparameter of the system that can be adjusted based
Class=1, and minimized otherwise. on the problem at hand. This tensor output from the MPS
n is then encoded into a quantum circuit. n dimensions are
1X
min(Cost(θd , X) = SW AP (|ϕX(i) ⟩, |ω⟩) (22) encoded onto n2 qubits using an RY and RZ rotation on
n i=1
each qubit to encode two dimensions per qubit. Because the
Cost = −ylog(p) − (1 − y)log(1 − p) (23) output of the MPS is not bounded, the arctangent of the input
VOLUME 4, 2023 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

L’Abbate et al.: A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity

values are encoded for the rotations to keep inputs to the trained on the data set, X. Line 6 represents the input data, x,
quantum circuit in the range of [- π2 , π2 ]. After encoding, the being encoded into the tensor network. Line 7 represents the
circuit is run to get a quantum state fidelity measurement. output of the tensor network being obtained through tensor
This measurement is then mapped from [0.5,1] to [0,1] by contractions. Lines 8-23 represent the process by which each
subtracting 0.5 and multiplying by 2. The swap test may of the quantum parameters θ is updated. The output of the
sometimes measure below 0.5 due to statistical error, so a tensor network and the trainable quantum circuit parameters
ReLu layer is applied after the quantum circuit to prevent θd are all loaded into the quantum circuit with one of the pa-
negative outputs. For multi-class classification, the ReLu rameters (θ) either increased by π2 (∆f wd ) and the SWAP test
layer is not used due to the presence of the softmax layer. If is performed. Then the parameters are reset, θ is decreased
the output is below 0.5, the image is classified as 0, otherwise by π2 (∆bck ), and the SWAP test is performed again. The
the image is classified as label 1. The quantum circuit has overall cost function of the network, f (θd ), is then obtained
up to three types of layers: single-qubit unitary, dual-qubit for the two adjusted parameter values and used to update θ
unitary, and controlled-qubit unitary. as seen in Line 22. After all of the quantum parameters have
For binary classification, a single quantum circuit is run. been updated, the parameters of the Tensor Network layer
For n-class classification where n > 2, n quantum circuits are updated as seen in Line 24. The quantum neural network
with the same circuit design, but different parameters are run is induced across all trained classes and the quantum state
in parallel. The outputs of these circuits are then softmaxed to fidelity outputs are softmaxed. The class with the highest
get probabilities for each class. The image is classified as the probability is returned as the classification.
class with the highest probability. System diagrams for the Algorithm 1 presents a hybrid training process that in-
binary and multi-class versions of this system can be seen in volves both classical and quantum ends, e.g., data loading
Figures 8 and 9, respectively and tensor networks on the classical side; quantum layers
and measurements on the quantum side. The time and space
complexity analysis should consider both quantum and clas-
sical resources. Due to the page limit and scope, we omit the
theoretical algorithm analysis in this paper.

V. EVALUATION
We utilized Python 3.9 and the IBM Qiskit Quantum Com-
puting simulator package to implement co-TenQu . The cir-
cuits were trained on NSF Cloudlab M510 nodes at the Uni-
versity of Utah datacenter. In our experiments, co-TenQu is
compared with state-of-the-art solutions listed below.
FIGURE 8: co-TenQu Diagram (Binary)
• PCA-QuClassi [39]: It is the predecessor of co-TenQu.
Instead of a collaborative quantum-classical training
framework, it utilizes principal component analysis
(PCA) to reduce the dimensions of the dataset. In our
evaluations, we use PCA-5, PCA-7 and PCA-17 to
denote its 5-qubit, 7-qubit and 17-qubit settings. Ad-
ditionally, PCA-QuClassi has been compared with its
different versions, including the Single Qubit Unitary
Layer, Dual Qubit Unitary Layer and Entanglement
Layer.
• QuntumFlow [40] (QF-pNet): It employs a co-design
framework of quantum neural networks and utilizes
FIGURE 9: co-TenQu Diagram (3-class) downsampling to reduce the dimensions along with the
amplitude encoding method.
This system can be trained all together at once rather than • TensorFlow Quantum [36] (TFQ): The example codes
requiring a feature extractor to be pre-trained. The entire provided by Tensorflow Quantum library are based on
training algorithm is summarized in Algorithm 1. First, the Cirq circuits and standard layer designs.
data is loaded as shown in Equation 26 (Line 1). Lines 2- • DNN-Fair [41]: A classical deep neural network for
3 involve introducing training parameters set by the user MNIST data may contain 1.2M parameters. For a more
at run time. The learning rate α indicates how large the fair comparison, we construct a deep neural network
updates to the system parameters should be during training. with 3145 parameters.
The network weights are initialized randomly. The number Furthermore, when comparing our co-TenQu architecture
of epochs ϵ indicates how many times the network will be to above-mentioned solutions in the literature of quantum
8 VOLUME 4, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

L’Abbate et al.: A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity

Algorithm 1 co-TenQu Algorithm |0 |0


1: Data set Loading Dataset: (X|Class : M ixed)
2: Distribute Dataset X By Class
3: Parameter Initialization:
Learning Rate : α = 10−4 y y
Network Weights : θd = [Rand Num between 0 − 1 × π] x x
epochs : ϵ = 40 |1 |1
Dataset: (X|Class = ω)
Qubit Channels: Q = 2nXdim (a) Qubit 1 - 0 Epochs (b) Qubit 1 - 10 Epochs
FIGURE 10: Identify 0 (Epoch 1 vs 10).
4: for ζ ∈ ϵ do
5: for xk ∈ X do
π
 
cos 2 x1 
6: Encode in Tensor Network x → π ⊗ ducted both binary and multi-class experiments and evaluated
sin 2 x1

cos π2 x2 
 
cos π2 xN 
 them with simulators as well as IBM-Q quantum machines.
⊗ . . . ⊗
sin π2 x2 sin π2 xN
A. QUANTUM BINARY CLASSIFICATION
7: Perform Tensor contractions to get TN output In order to understand how our learning process works, we
8: for θ ∈ θd do visualized the training process of identifying a 0 against a
9: Perform Hadamard Gate on Q0 6 by looking at the final state that is passed to the SWAP
Quantum
10: Load xk −−−−−−−−−→ QQ1 → Qcount test. As illustrated in Fig. 10, an initial random quantum state
DataEncoding
Quantum is used to learn to classify 0 against 6. It is important to
11: Load θd −−−−−−−−−→ Q Qcount +1 + 1 →
DataEncoding 2 note that the state visualization does not account for potential
Qcount
2 + 1 learned entanglements, but serves as a visual aid to the
12: Add π2 → θ learning process. In Fig. 10, we can observe the evolution
13: ∆f wd = (EQ0 f (θd )) of the identifying state through epochs. The green arrows
14: CSWAP(Control Qubit = Q0 , Learned State indicate the deep learning final state, and the blue points
Qubit, Data Qubit) represent the training points. Initially, the identifying states
15: Measure Q0 are random, but they rotate and move towards the data,
16: Reset Q0 to |0⟩ gradually minimizing the cost.
17: Perform Hadamard Gate on Q0 For binary classifications, we adopted popular digit combi-
18: Subtract π2 → θ nations from the literature, specifically (1,5), (3,6), (3,8), and
19: CSWAP(Control Qubit = Q0 , Learned State (3,9). The binary classification results are compared and visu-
Qubit, Data Qubit) alized in Fig.11. Clearly, co-TenQu consistently outperforms
20: Measure Q0 all other solutions. For example, in the (1,5) classification
21: ∆bck = (EQ0 f (θd )) with MNIST dataset (as shown on Fig.11a), it achieves the
22: θ = θ − (0.5 ∗ (∆f wd − ∆bck )) × α largest improvement of 41.72% compared to classical deep
23: end for neural networks, DNN-Fair (3145 parameters), with an ac-
24: Update Tensor Network parameters curacy of 99.79%. While classical DNN can achieve perfect
25: end for accuracies on the MNIST dataset, it requires a much larger
26: end for parameter size. By introducing 5 qubits, co-TenQu is able
to achieve better or similar performance with 49.54% less
parameters.
deep learning, the MNIST dataset is a commonly used bench- When compared to quantum-based solutions with MNIST
mark. MNIST comprises hand-written digits of resolution dataset, co-TenQu outperforms others, with the largest mar-
28×28, resulting in 784 dimensions. However, the evaluation gin achieved in the (3,8) and (3,9) classification, where we
data-encoding technique makes it impractical to perform observe improvements of 35.07% and 30.71% over Tensor-
experiments on near-term quantum computers and simulators flow Quantum and QF-pNet. One noticeable thing is that,
due to the lack of qubits and computational complexity. As if we train Tensorflow Quantum with 17 qubits (verus 5
a result, we need to reduce the dimensionality to perform qubits), the accuracies increase substantially. For example,
practical experiments. Therefore, it is necessary to reduce the accuracy boosted to from 71.25% to 90.63%. The pri-
the dimensionality of the dataset. In our research, we have mary difference between the designs is that co-TenQu uti-
reduced the number of dimensions to 4 for binary experi- lizes a quantum-state based evaluation function that can
ments/simulations and 6 for multi-class evaluations. Besides directly train the network on qubits and provide stable results.
original MNIST dataset, our evaluation involves two derived co-TenQu also outperformed its predecessor, PCA-QuClassi
datasets, Fashion MNIST and Extended MNIST. We con- with MNIST dataset. While both employ a quantum-state
VOLUME 4, 2023 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

L’Abbate et al.: A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity

co-TenQu PCA-5 QF-pNet TFQ DNN-Fair co-TenQu PCA-5 TFQ DNN-Fair co-TenQu PCA-5 TFQ DNN-Fair
1.0 1.0 1.0

0.8 0.8 0.8

Accuracy
Accuracy
0.6 0.6 0.6
Accuracy

0.4 0.4 0.4

0.2 0.2 0.2

0.0 0.0 0.0


1/5 3/6 3/8 3/9 1/5 3/6 3/8 3/9 1/5 3/6 3/8 3/9

(a) MNIST (b) Fashion MNIST (c) Extended MNIST


FIGURE 11: Binary Classifications with 5-qubit circuits for co-TenQu

based evaluation function, co-TenQu incorporates a new 1.0


trainable tensor network layer, allowing part of the training
job to be completed on the classical part of the collaborative 0.9
architecture.
0.8
A similar trend is discovered with both Fashion and
Extended MNIST datasets as illustrated on Fig.11b and 0.7
Fig.11c. We can see that co-TenQu outperforms all other
solutions in compared 2-digit combinations. Comparing the 0.6
results across 3 different datasets, TensorFlow Quantum’s co-TenQu
0.5 PCA-5
performance is not stable. For example, it achieves 62.58%,
84.08%, and 66.25% for (3,8) classification that is a 21.50% 0 5 10 15 20 25 30 35 40
difference between datasets. With co-TenQu , however, the FIGURE 12: 1/5 Extended MNIST Training
same value is 1.55% with 97.65%, 99.20%, and 98.54% for
original, Fashion and Extended MNIST datasets respectively.
co-TenQu also beats PCA-QuClassi with 5-qubit setting observe that co-TenQu consistently outperforms other so-
(shown as PCA-5 on the figures) in all binary combinations lutions. It achieves 97.39%, 98.94%, and 91.48% for the
with the largest gain, 26.58%, observes at (3,6) Fashion first three multi-class experiments. PCA-QuClassi with the
MNIST (Fig.11b). This is due to the fact that co-TenQu uti- same 7-qubit setting records 58.55%, 67.68%, and 62.02%.
lizes the classical computational resource to partially com- It demonstrates that co-TenQu gains superior performance
plete training and pre-process the data for quantum parts. improvement, up to 66.3%, by introducing the quantum-
Furthermore, we find that co-TenQu converges faster classical collaborative training architecture. When increase
than PCA-QuClassi when taking a closer look at the the qubits utilization of PCA-QuClassi to the 17-qubit setting
training processes. Fig.12 presents the accuracy per each (shown as PCA-17 on the figures), its performance boosts to
epoch of (1,5) classification on Extended MNIST dataset. 94.91%, 94.18%, and 92.49% such that co-TenQu wins the
co-TenQu reaches 93.75% at its their epoch, after which it first two experiments, but fails the last one by 1%. It further
increases 5.10% to 98.85% at the 40th epoch. Comparing proves that co-TenQu is able to achieve similar performance
with PCA-QuClassi with the 5 qubit setting, however, it with 70.59% less quantum resources (5 vs 17). Considering
records a 87.95% accuracy at the 18th epoch and climbs up 10-class experiment, co-TenQu performs significantly bet-
to 93.30% at the end, a 5.35% increase. Given the training ter PCA-QuClassi 7-qubit setting (73.21% vs 33.41%), but
process, co-TenQu converges significantly faster than PCA- slightly worse than its 17-qubit version by 5%. The reason
QuClassi as it leverages trainable layers on the classical part. lies in the fact that 17-qubit setting contains much more
information for the training.
B. QUANTUM MULTI-CLASS CLASSIFICATION When comparing with QF-pNet, co-TenQu im-
Next, we evaluate our solution with multi-class classifica- proves the accuracies in all experiments. For example,
tions. In these experiments, co-TenQu utilizes a 7-qubit co-TenQu achieves 97.39% and 98.94% for (0,3,6) and
setting. The results demonstrate that co-TenQu provides (1,3,6), comparing with 78.70% and 86.50% obtained by
substantially better multi-class classification accuracies when QF-pNet, which leads to accuracy increases of 23.75% and
comparing with the state-of-the-arts. With the multi-class 14.38%. In 5-class classification, co-TenQu gains 19.92%
classification, we select the popular digit combinations, (91.48% vs 71.56%). As the number of classes increase,
(0,3,6), (1,3,6), (0,1,3,6,9) and 10-class, in the literature. co-TenQu outperforms QF-pNet by more than 181.90%
The results are illustrated in Fig.13. On the figure, we (73.21% vs 25.97%) for 10-class classification. In Quan-
10 VOLUME 4, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

L’Abbate et al.: A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity

co-TenQu PCA-7 PCA-17 QF-pNet TFQ


1.0
co-TenQu PCA-7 TFQ co-TenQu PCA-7 TFQ
1.0
0.8 0.8 0.8

Accuracy

Accuracy
0.6 0.6 0.6
Accuracy

0.4 0.4 0.4


0.2 0.2 0.2
0.0 0.0
0.0
0/3/6 1/3/6 0/1/3/6/9 10-Class 0/3/6 1/3/6 0/1/3/6/9 10-Class 0/3/6 1/3/6 0/1/3/6/9 10-Class
(a) MNIST (b) Fashion MNIST (c) Extended MNIST
FIGURE 13: Multi-class Classifications with 7-qubit circuits for co-TenQu

tumFlow (QF-pNet), most of the training is done on the specific biases.


classical computer, where the traditional loss function is in
use. With co-TenQu, however, we employ a quantum-state VI. DISCUSSION AND CONCLUSION
based evaluation function that can fully utilize the qubits and In this work, we propose co-TenQu, a collaborative quantum-
a collaborative training architecture. classic architecture for quantum neural networks. On the
We further compare co-TenQu with PCA-QuClassi under classical side, it utilizes a tensor network with trainable layers
the same 7-qubit setting with Fashion and Extended MNIST to preprocess the dataset to extract features and reduce the di-
datasets. The same trend can be found on the Fig.13b and mensionality. On the quantum part, it employs the quantum-
Fig.13c, where co-TenQu consistently outperforms its pre- state fidelity based cost function to train the model. Compar-
decessor. It achieves the largest gain on (1,3,6) classification ing to classical deep neural networks, co-TenQu achieves
with Extended MNIST that is 99.06% comparing with PCA- 41.72% accuracy improvement with a 49.54% reduction
7’s 50.90%. co-TenQu achieves stable performance on all 3- in the parameter count. Additionally, it outperforms other
class and 5-class classifications across different datasets. For quantum-based solutions, up to 1.9 times, in multi-class
example, the accuracies for (0,3,6), (1,3,6) and (0,1,3,6,9) on classification. Furthermore, it records similar performance
Extended MNIST are 98.16%, 99.06%, and 94.88%. With with 70.59% less quantum resources. co-TenQu represents a
the 10-class job, the values drop to 73.40% and 63.38% notable advancement in the realm of quantum deep learning.
for Fashion and Extended MNIST, respectively. However, However, there remains considerable room for progress. Due
co-TenQu utilizes merely 7 qubits and performs much better, to the limitations of current quantum machines, the existing
up to 1.90x, than PCA-QuClassi. solutions can only be evaluated on small dataset, such as
MNIST. In addition, the 10-class classification of MNIST
C. EXPERIMENTS ON IBM-Q PLATFORM resulted in a 73.21% accuracy, which is relatively modest
As a proof of concept, we evaluate co-TenQu on real quan- in comparison to classical counterparts. Although classical
tum computers through the IBM-Q platform. 300 data points methods employ a higher number of parameters, they achieve
of the (1,5) and (3,6) MNIST experiments are submitted to 14 accuracies approaching 100%, which highlights the potential
of IBM-Q’s superconducting quantum computers. Circuits benefits that quantum computing could offer.
are generated based off of a trained co-TenQu network, Our future research will concentrate on extending the
whereby 300 circuits are submitted per machine in one job quantum-state fidelity based cost function and collaborative
at 8192 shots each. The results are demonstrated in Fig.14. quantum-classical architecture to other applications, such as
8 of the 14 machines generate a 66.67% accuracy, which quantum transformers and quantum natural language pro-
is the accuracy of the experiment for assuming all 0’s (i.e. cessing. Additionally, exploring the low-qubit representation
ground state). Variational parameters from simulation can and its resilience to dynamic noises in the field of quantum-
perform poorly on real machines, with problems such as based learning warrants further investigation.
temporal drift and machine specific bias causing induction
issues [42]. Within tested machines, IBMQ-Lima achieved REFERENCES
the best results, at 82.10%. Lima’s topology is drawn in [1] N. C. Thompson, K. Greenewald, K. Lee, and G. F. Manso, “The
Fig.15, and has a Quantum Volume of 8, one of the lowest computational limits of deep learning,” 2020. [Online]. Available:
https://arxiv.org/abs/2007.05558
of IBM machines. This highlights the complexity that is
[2] I. L. Chuang, N. Gershenfeld, and M. Kubinec, “Experimental
predicting machine performance of quantum routines, and implementation of fast quantum searching,” Phys. Rev. Lett., vol. 80,
the implications that temporal drift has on learned param- pp. 3408–3411, Apr 1998. [Online]. Available: https://link.aps.org/doi/10.
1103/PhysRevLett.80.3408
eters. Therefore, given sufficient resource, performance can
[3] https://quantum-computing.ibm.com/.
be improved by optimizing the trained network locally, and [4] F. Arute, K. Arya, R. Babbush, D. Bacon et al., “Quantum supremacy
finalizing training on the processor to learn the machine using a programmable superconducting processor,” Nature, vol. 574,

VOLUME 4, 2023 11

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

L’Abbate et al.: A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity

0.8 Manila
Perth
0.7 Auckland
0.6 Mumbai
Quito
0.5 Guadalupe
Accuracy

Montreal
0.4 Belem
Lima
0.3 Lagos
Washington
0.2 Toronto
Hanoi
0.1 Jakarta
0.0
1/5 3/6

FIGURE 14: (1,5) and (3,6) MNIST Binary Classifications on IBM-Q Quantum FIGURE 15: IBM-Q Lima Topology

p. 505–510, 2019. [Online]. Available: https://www.nature.com/articles/ [23] A. D’Onofrio, A. Hossain, L. Santana, N. Machlovi, S. Stein, J. Liu,
s41586-019-1666-5 A. Li, and Y. Mao, “Distributed quantum learning with co-management in
[5] P. W. Shor, “Polynomial-time algorithms for prime factorization and a multi-tenant quantum system,” in 2023 IEEE International Conference
discrete logarithms on a quantum computer,” SIAM Journal on on Big Data (BigData). IEEE, 2023, pp. 221–228.
Computing, vol. 26, no. 5, pp. 1484–1509, oct 1997. [Online]. Available: [24] K. Beer, D. Bondarenko, T. Farrelly, T. J. Osborne, R. Salzmann,
https://doi.org/10.1137%2Fs0097539795293172 D. Scheiermann, and R. Wolf, “Training deep quantum neural networks,”
[6] L. K. Grover, “A fast quantum mechanical algorithm for database search,” Nature communications, vol. 11, no. 1, p. 808, 2020.
1996. [Online]. Available: https://arxiv.org/abs/quant-ph/9605043 [25] A. Abbas, D. Sutter, C. Zoufal, A. Lucchi, A. Figalli, and S. Woerner,
[7] S. Garg and G. Ramakrishnan, “Advances in quantum deep learning: An “The power of quantum neural networks,” Nature Computational Science,
overview,” 2020. [Online]. Available: https://arxiv.org/abs/2005.04316 vol. 1, no. 6, pp. 403–409, 2021.
[8] K. Beer, D. Bondarenko, T. Farrelly, R. S. T. J. Osborne, D. Scheiermann, [26] Z. Liang, H. Wang, J. Cheng, Y. Ding, H. Ren, Z. Gao, Z. Hu, D. S. Boning,
, and R. Wolf, “Training deep quantum neural networks,” Nature commu- X. Qian, S. Han et al., “Variational quantum pulse learning,” in 2022 IEEE
nications,vol. 11, no. 1, pp. 1–6, 2019. International Conference on Quantum Computing and Engineering (QCE).
[9] I. Kerenidis, A. Luongo, and A. Prakash, “Quantum expectation- IEEE, 2022, pp. 556–565.
maximization for gaussian mixture models,” 2019. [Online]. Available: [27] P. Easom-McCaldin, A. Bouridane, A. Belatreche, R. Jiang, and S. Al-
https://arxiv.org/abs/1908.06657 Maadeed, “Efficient quantum image classification using single qubit en-
[10] T. Li, S. Chakrabarti, and X. Wu, “Sublinear quantum algorithms for coding,” IEEE Transactions on Neural Networks and Learning Systems,
training linear and kernel-based classifiers,” 2019. [Online]. Available: 2022.
https://arxiv.org/abs/1904.02276 [28] “mnist,” http://yann.lecun.com/exdb/mnist/.
[11] C. Ding, T.-Y. Bao, and H.-L. Huang, “Quantum-inspired support vector [29] E. Farhi and H. Neven, “Classification with quantum neural networks
machine,” 2019. [Online]. Available: https://arxiv.org/abs/1906.08902 on near term processors,” 2018. [Online]. Available: https://arxiv.org/abs/
[12] A. Panahi, S. Saeedi, and T. Arodz, “word2ket: Space-efficient 1802.06002
word embeddings inspired by quantum entanglement,” 2019. [Online]. [30] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, “Quantum circuit
Available: https://arxiv.org/abs/1911.04975 learning,” Physical Review A, vol. 98, no. 3, p. 032309, 2018.
[13] P. Kaye, R. Laflamme, and M. M. al., “An introduction to quantum [31] M. Ostaszewski, L. M. Trenkwalder, W. Masarczyk, E. Scerri, and V. Dun-
computing,” Oxford university press, 2007. jko, “Reinforcement learning for optimization of variational quantum cir-
[14] S. A. Stein, B. Baheri, D. Chen, Y. Mao, Q. Guan, A. Li, S. Xu, and cuit architectures,” Advances in Neural Information Processing Systems,
C. Ding, “Quclassi: A hybrid deep neural network architecture based on vol. 34, pp. 18 182–18 194, 2021.
quantum state fidelity,” Proceedings of Machine Learning and Systems, [32] J. Stokes, J. Izaac, N. Killoran, and G. Carleo, “Quantum natural
vol. 4, 2022. gradient,” Quantum, vol. 4, p. 269, may 2020. [Online]. Available:
[15] S. A. Stein, R. L’Abbate, W. Mu, Y. Liu, B. Baheri, Y. Mao, G. Qiang, https://doi.org/10.22331%2Fq-2020-05-25-269
A. Li, and B. Fang, “A hybrid system for learning classical data in [33] I. Cong, S. Choi, and M. D. Lukin, “Quantum convolutional neural
quantum states,” in 2021 IEEE International Performance, Computing, and networks,” Nature Physics, vol. 15, no. 12, pp. 1273–1278, aug 2019.
Communications Conference (IPCCC). IEEE, 2021, pp. 1–7. [Online]. Available: https://doi.org/10.1038%2Fs41567-019-0648-8
[16] S. A. Stein, B. Baheri, D. Chen, Y. Mao, Q. Guan, A. Li, B. Fang, and [34] S. Stein, Y. Mao, J. Ang, and A. Li, “Qucnn: A quantum convolu-
S. Xu, “Qugan: A quantum state fidelity based generative adversarial tional neural network with entanglement based backpropagation,” in 2022
network,” in 2021 IEEE International Conference on Quantum Computing IEEE/ACM 7th Symposium on Edge Computing (SEC). IEEE, 2022, pp.
and Engineering (QCE). IEEE, 2021, pp. 71–81. 368–374.
[17] S. Y.-C. Chen, C.-M. Huang, C.-W. Hsing, and Y.-J. Kao, “An end-to-end [35] W. Jiang, J. Xiong, and Y. Shi, “A co-design framework of neural
trainable hybrid classical-quantum classifier,” Machine Learning: Science networks and quantum circuits towards quantum advantage,” Nature
and Technology, vol. 2, no. 4, p. 045021, 2021. Communications, vol. 12, no. 1, jan 2021. [Online]. Available:
[18] T. Hur, L. Kim, and D. K. Park, “Quantum convolutional neural network https://doi.org/10.1038%2Fs41467-020-20729-5
for classical data classification,” Quantum Machine Intelligence, vol. 4, [36] M. Broughton, G. Verdon, T. McCourt, A. J. Martinez, J. H. Yoo, S. V.
no. 1, p. 3, 2022. Isakov, P. Massey, R. Halavati, M. Y. Niu, A. Zlokapa et al., “Tensorflow
[19] E. H. Houssein, Z. Abohashima, M. Elhoseny, and W. M. Mohamed, quantum: A software framework for quantum machine learning,” arXiv
“Machine learning in the quantum realm: The state-of-the-art, challenges, preprint arXiv:2003.02989, 2020.
and future vision,” Expert Systems with Applications, p. 116512, 2022. [37] S. Y.-C. Chen, C.-M. Huang, C.-W. Hsing, and Y.-J. Kao, “Hybrid
[20] F. V. Massoli, L. Vadicamo, G. Amato, and F. Falchi, “A leap among quantum-classical classifier based on tensor network and variational
quantum computing and quantum neural networks: A survey,” ACM quantum circuit,” 2020. [Online]. Available: https://arxiv.org/abs/2011.
Computing Surveys, vol. 55, no. 5, pp. 1–37, 2022. 14651
[21] M. Cerezo, G. Verdon, H.-Y. Huang, L. Cincio, and P. J. Coles, “Chal- [38] V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, M. S. Alam, S. Ahmed,
lenges and opportunities in quantum machine learning,” Nature Computa- J. M. Arrazola, C. Blank, A. Delgado, S. Jahangiri, K. McKiernan,
tional Science, vol. 2, no. 9, pp. 567–576, 2022. J. J. Meyer, Z. Niu, A. Száva, and N. Killoran, “Pennylane: Automatic
[22] W.-L. Chang and A. V. Vasilakos, Fundamentals of Quantum Program- differentiation of hybrid quantum-classical computations,” 2018. [Online].
ming in IBM’s Quantum Computers. Springer, 2021. Available: https://arxiv.org/abs/1811.04968

12 VOLUME 4, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Transactions on Quantum Engineering. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TQE.2024.3367234

L’Abbate et al.: A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity

[39] S. A. Stein, B. Baheri, D. Chen, Y. Mao, Q. Guan, A. Li, S. Xu, and ANG LI is a senior computer scientist in the
C. Ding, “Quclassi: A hybrid deep neural network architecture based on Physical and Computational Directorate of Pa-
quantum state fidelity,” Proceedings of Machine Learning and Systems, cific Northwest National Laboratory and affiliated
vol. 4, pp. 251–264, 2022. Associate Professor at University of Washington,
WA, USA. He received B.E. from Zhejiang Uni-
[40] W. Jiang, J. Xiong, and Y. Shi, “A co-design framework of neural networks versity, China, and two PhD from the National
and quantum circuits towards quantum advantage,” Nature communica- University of Singapore and Eindhoven University
tions, vol. 12, no. 1, p. 579, 2021. of Technology, Netherlands. His research focuses
on software-hardware codesign for scalable het-
[41] “Tensorflow Quantum Fair Comparison,” https://www.tensorflow.org/ erogeneous HPC and quantum computing.
quantum/tutorials/mnist, [Online; accessed 07-March-2023].
PIN-YU CHEN is a principal research staff mem-
ber at IBM Research, Yorktown Heights, NY,
[42] S. Stein, N. Wiebe, Y. Ding, P. Bo, K. Kowalski, N. Baker, J. Ang, USA. He is also the chief scientist of RPI-IBM AI
and A. Li, “Eqc: ensembled quantum computing for variational quantum Research Collaboration and PI of ongoing MIT-
algorithms,” in Proceedings of the 49th Annual International Symposium
IBM Watson AI Lab projects. Dr. Chen received
on Computer Architecture, 2022, pp. 59–71.
his Ph.D. degree in electrical engineering and
computer science from the University of Michi-
gan, Ann Arbor, USA, in 2016. Dr. Chen’s recent
research focuses on adversarial machine learning
RYAN L’ABBATE is a database developer and and robustness of neural networks. His long-term
research chemist at En-Tech Corp. He received research vision is to build trustworthy machine learning systems. At IBM
bachelor’s degrees in Chemical Engineering and Research, he received the honor of IBM Master Inventor and several re-
Mathematics from Manhattan College in Decem- search accomplishment awards, including an IBM Master Inventor and IBM
ber 2017 and a master’s degree in Data Science Corporate Technical Award in 2021. His research works contribute to IBM
from the Department of Computer and Informa- open-source libraries including Adversarial Robustness Toolbox (ART 360)
tion Science at Fordham University in May 2022. and AI Explainability 360 (AIX 360). He has published more than 50 papers
He was inducted into the Omega Chi Epsilon related to trustworthy machine learning at major AI and machine learning
honors society for chemical engineering and the conferences. He is a member of IEEE and an associate editor of Transactions
Pi Mu Epsilon honors society for mathematics. He on Machine Learning Research.
has also had data science research published in the Haseltonia journal. His
JUNTAO CHEN (S’15-M’21) received the Ph.D.
research interests include quantum computing, quantum data science, and
degree in Electrical Engineering from New York
data structures.
University (NYU), Brooklyn, NY, in 2020, and the
ANTHONY D’ONOFRIO JR. is a former grad- B.Eng. degree in Electrical Engineering and Au-
uate student from the Department of Computer tomation with honor from Central South Univer-
and Information Science at Fordham University in sity, Changsha, China, in 2014. He is currently an
New York City. He received his bachelor’s degree assistant professor at the Department of Computer
in Computer Science from Fordham University in and Information Sciences and an affiliated faculty
2022 and his master’s degree in May 2023. During member with the Fordham Center of Cybersecu-
his time at Fordham, he was a Fordham-IBM rity, Fordham University, New York, USA. His
research intern and was inducted into Fordham research interests include cyber-physical security and resilience, quantum
University’s Chapter of Sigma Xi, the Scientific AI and its security, game and decision theory, network optimization and
Research Honor Society, as an Associate Member learning. He was a recipient of the Ernst Weber Fellowship, the Dante Youla
for his work. His research interests focus on the fields of distributed systems, Award, and the Alexander Hessel Award for the Best Ph.D. Dissertation in
quantum systems, quantum deep learning, and software engineering. Electrical Engineering from NYU.

SAMUEL STEIN is a Staff Scientist in the high- YING MAO is an Associate Professor in the
performance-computing (HPC) group of Pacific Department of Computer and Information Sci-
Northwest National Laboratory (PNNL) since De- ence at Fordham University in the New York
cember, 2022. He received his bachelors degree City. He received his Ph.D. in Computer Sci-
in Chemical Engineering from the University of ence from the University of Massachusetts Boston
Cape Town, South Africa, in 2018 and his masters in 2016. He was a Fordham-IBM research fel-
in Data Science from Fordham University in 2020. low. His research interests mainly focus on the
His research has been focusing on Quantum Ma- fields of quantum systems, quantum deep learning,
chine Learning, Quantum Error Mitigation, and quantum-classical optimizations, quantum system
Distributed Quantum Computing. More recently, virtualization, cloud resource management, data-
his research has focused on heterogeneous quantum computing designs, and intensive platforms and containerized applications.
distributed quantum computing architectures.

SAMUEL YEN-CHI CHEN was an assistant


computational scientist at Brookhaven National
Laboratory, Upton, NY, USA. Dr. Chen received
his Ph.D. degree in physics from National Taiwan
University, Taipei, Taiwan, in 2020. He received
the B.S. degree in physics and M.D. degree in
medicine from National Taiwan University, Taipei,
Taiwan, in 2016. His research focus on combining
quantum computing and machine learning. He was
the recipient of Theoretical High-Energy Physics
Fellowship from Chen Cheng Foundation in 2014 and First Prize in the
VOLUME
Software4,Competition
2023 (Research Category) from Xanadu Quantum Tech- 13
nologies in 2019.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like