0% found this document useful (0 votes)
43 views16 pages

Classical-To-quantum Convolutional Neural Network Transfer Learning

Uploaded by

Shekufeh Alavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views16 pages

Classical-To-quantum Convolutional Neural Network Transfer Learning

Uploaded by

Shekufeh Alavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Classical-to-quantum convolutional neural network transfer learning

Juhyeon Kima , Joonsuk Huha,b,c , Daniel K. Parkd,e,∗


a SKKU Advanced Institute of Nanotechnology, Sungkyunkwan University, Suwon, Republic of Korea
b Department of Chemistry, Sungkyunkwan University, Suwon, Republic of Korea
c Institute of Quantum Biophysics, Sungkyunkwan University, Suwon, Republic of Korea
d Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea
e Department of Statistics and Data Science, Yonsei University, Seoul, Republic of Korea
arXiv:2208.14708v2 [quant-ph] 28 Sep 2023

Abstract
Machine learning using quantum convolutional neural networks (QCNNs) has demonstrated success in
both quantum and classical data classification. In previous studies, QCNNs attained a higher classification
accuracy than their classical counterparts under the same training conditions in the few-parameter regime.
However, the general performance of large-scale quantum models is difficult to examine because of the limited
size of quantum circuits, which can be reliably implemented in the near future. We propose transfer learning
as an effective strategy for utilizing small QCNNs in the noisy intermediate-scale quantum era to the full
extent. In the classical-to-quantum transfer learning framework, a QCNN can solve complex classification
problems without requiring a large-scale quantum circuit by utilizing a pre-trained classical convolutional
neural network (CNN). We perform numerical simulations of QCNN models with various sets of quantum
convolution and pooling operations for MNIST data classification under transfer learning, in which a classical
CNN is trained with Fashion-MNIST data. The results show that transfer learning from classical to quantum
CNN performs considerably better than purely classical transfer learning models under similar training
conditions.
Keywords: Quantum computing, Quantum machine learning, Quantum convolutional neural network,
Transfer learning

1. Introduction

Machine learning (ML) with a parameterized quantum circuit (PQC) is a promising approach for improving
existing methods beyond classical capabilities [1–7]. This is a classical-quantum hybrid algorithm in which
the cost function and its corresponding gradient are computed using quantum circuits [8, 9] and the model
parameters are updated classically. Such hybrid ML models are particularly advantageous when cost function
minimization is difficult to perform classically [4, 10, 11]. These models optimize the quantum gate parameters
under the given experimental setup, and hence can be robust to systematic errors. Furthermore, they are less
prone to decoherence because iterative computation can be exploited to reduce the quantum circuit depth.
Thus, the hybrid algorithm has the potential to achieve quantum advantage in solving various problems in
the noisy intermediate-scale quantum (NISQ)1 era [12, 13].
A critical challenge in the utilization of PQC for solving real-world problems is the barren plateau
phenomenon in the optimization landscape, which makes training the quantum model that samples from the
Haar measure difficult as the number of qubits increases [14]. One way to avoid the barren plateau is to

∗ Corresponding author
Email addresses: [email protected] (Joonsuk Huh), [email protected] (Daniel K. Park)
1 NISQ refers to the domain of quantum computing where the number of quantum processors that can be manipulated

reliably is limited due to noise, yet holds the potential to surpasses the classical capabilities to a certain extent. As NISQ
technology becomes increasingly accessible, the discovery of its real-world applications has become crucially important.

1
adopt a hierarchical structure [15, 16], in which the number of qubits decreases exponentially with quantum
circuit depth, such as in quantum convolutional neural networks (QCNNs) [4]. The hierarchical structure is
interesting from a theoretical perspective because of its close connection to tensor networks [15, 17]. Moreover,
the shallow depth of a QCNN, which grows logarithmically with the number of input qubits, makes it well
suited for NISQ computing. In addition, an information-theoretic analysis shows that the QCNN architecture
can help reduce the generalization error [18], which is one of the central goals of machine learning. All these
factors motivate the application of QCNN for machine learning. QCNNs have been shown to be useful for
solving both quantum [4, 19] and classical [20] problems despite their restricted structure with a shallow-depth
quantum circuit. In Ref. [20], for binary classification with the MNIST [21] and Fashion-MNIST [22] datasets,
QCNN yielded higher classification accuracy than the classical convolutional neural network (CNN) when
only 51 or fewer parameters were used to construct these models. The best-known classical CNN-based
classifiers for the same datasets typically employ millions of parameters. However, the size of the quantum
circuits that can be implemented with current quantum devices is too small to incorporate such a large
number of parameters. Therefore, two important issues remain. The first is to verify whether a QCNN can
continue to outperform its classical counterpart as the number of trainable model parameters increases. The
second is to utilize small QCNNs that can be realized in the near future to the full extent, so that a quantum
advantage can be achieved in solving practical problems. The latter is the main focus of this work.
An ML problem for which the quantum advantage in the few-parameter regime can be exploited is transfer
learning (TL) [23–26]. TL aims to utilize what has been learned in one setting to improve generalization in
another setting that is independent of the former. TL can be applied to classical-quantum hybrid networks
such that the parameters learned for a classical model are transferred to training a quantum model or vice
versa [27]. In the classical-to-quantum (C2Q) TL scheme, the number of qubits increases with the number of
output nodes (or features) of the pre-trained classical neural network. This indicates that the transferred part
of a classical neural network should have a small number of output nodes to find applications in the NISQ
era. For example, using a pre-trained feedforward neural network with a large number of nodes throughout
the layers would not be well suited for near-term hybrid TL. By contrast, building a TL model with a
classical and quantum CNN is viable because the number of features in the CNN progressively decreases via
subsampling (i.e., pooling), and the QCNN has already exhibited an advantage with a small number of input
qubits.
Motivated by the aforementioned observations, we propose a TL framework for classical-to-quantum
convolutional neural networks (C2Q-CNNs). Unlike previous works, C2Q-CNN transfers knowledge from a
pre-trained (source) classical CNN to a quantum CNN, thereby preserving the benefits of quantum CNN.
Our method avoids the need for classical data dimensionality reduction, commonly required in existing
methods, as the classical CNN serves as the source for TL. Additionally, we introduce new ansatzes for both
quantum pooling and convolutional operations, enriching the model selection in the QCNN. To evaluate
the performance of C2Q-CNN, we conduct numerical simulations on the MNIST data classification task
using PennyLane [28]. The classical CNN is pre-trained on the Fashion-MNIST dataset. The simulations
assess the classification accuracy under different quantum convolution and pooling operations and compare
C2Q-CNN with various classical-to-classical CNN (C2C-CNN) TL schemes. The results show that C2Q-CNN
outperforms C2C-CNN with respect to classification accuracy under similar training conditions. Furthermore,
the new quantum pooling operation developed in this work is more effective in demonstrating the quantum
advantage.
The remainder of this paper is organized as follows. Section 2 reviews QCNNs and TL. This section
also introduces the generalization of the pooling operation of the QCNN. Section 3 explains the general
framework for C2Q-CNN TL. The simulation results are presented in section 4. MNIST data classification
was performed with a CNN pre-trained for Fashion-MNIST data, and the performance of the C2Q-CNN
models was compared with that of various C2C-CNN models. The conclusions and outlook are presented in
Section 5.

2
QUANTUM CONVOLUTIONAL NEURAL NETWORK

Convolution
Data Encoding
Pooling

Convolution | ψ⟩d
Classical Computer
Pooling

Cost Function

θnew = arg minθ C(θ)


(a) (b)

Figure 1: (a) Schematics of the QCNN algorithm with (b) an example for eight input qubits. Given a quantum state, |ψ⟩d ,
which encodes classical data, the quantum circuit comprises two parts: convolutional filters (rectangles) and pooling (circles).
The convolutional filter and pooling use parameterized quantum gates. Three layers of convolution–pooling pairs are presented
in this example. In each layer, the convolutional filter applies the identical two-qubit ansatz to the nearest neighbor qubits in a
translationally invariant manner. The quantum convolutional operations in the QCNN circuits are designed to meet the closed
boundary condition, as indicated by the open-ended gates in the figure, ensuring the top and bottom qubits in each layer are
connected. Pooling operations within the layer are identical to each other, but differ from convolutional filters. The pooling
operation is represented as a controlled unitary transformation, and the half-filled circle on the control qubit indicates that
different unitary gates can be applied to each subspace of the control qubit. The measurement outcome of the quantum circuit
is used to calculate the user-defined cost function. A classical computer is used to compute the new set of parameters based on
the gradient, and the quantum circuit parameters are updated for the subsequent round. The optimization process is iterated
until pre-selected conditions are met.

2. Preliminaries

2.1. Quantum convolutional neural network


Quantum convolutional neural networks are parameterized quantum circuits with unique structures
inspired by classical CNNs [4, 20]. In general, QCNNs follow two basic principles of classical CNNs:
translational invariance of convolutional operations and dimensionality reduction via pooling. However,
QCNNs differ from classical CNNs in several aspects. First, the data are defined in quantum Hilbert space,
which grows exponentially with the number of qubits. Consequently, the quantum convolutional operation is
not an inner product, as in the classical case, but a unitary transformation of a state vector, which is a linear
map that transforms a vector to a vector, whereas a classical convolution operation is a linear map that
transforms a vector to a scalar. The pooling in QCNN traces out half of the qubits, similar to the pooling in
the CNN that subsamples the feature space. Typically, the pooling layer includes parameterized two-qubit
controlled-unitary gates, and the control qubits are traced out after the gate operations. Without loss of
generality, we refer to the structure of a parameterized unitary operator for either convolution or pooling as
ansatz. The cost function of a model with given parameters is defined with an expectation value of some
observable with respect to the final quantum state obtained after repeating quantum convolutional and
pooling operations. The QCNN is trained by updating the model parameters to minimize the cost function
until a pre-determined convergence condition is met. The general concept of a QCNN is illustrated in Fig. 1
(a). An example of a circuit with eight input qubits is shown in (b). The depth of the QCNN circuit after
repeating the convolution and pooling until one qubit remains is O(log N ), where N is the number of input
qubits. This shallow depth allows the QCNN to perform well on quantum hardware that will be developed
in the near future.
The quantum convolution and pooling operations can be parameterized in many ways. The convolution
ansatzes evaluated in this study are illustrated in Fig. 2. Among them, circuits (b) to (j) are the nine ansatzes
previously tested in Ref. [20]. These ansatzes are motivated by past studies. For instance, circuit (b) is a
parameterized quantum circuit that was used to train a tree tensor network (TTN) [15]. The four-qubit
parameterized quantum circuits analyzed by Sim et al.[29] were modified to two-qubit circuits to serve as

3
Ry(θ1) H Rx(θ1) Ry(θ1) Ry(θ3)

Ry(θ2) H Rx(θ2) Ry(θ2) Ry(θ4)


(a) Convolution 1 (b) Convolution 2 (c) Convolution 3 (d) Convolution 4

Ry(θ1) Rz(θ3) Ry(θ4) Ry(θ1) Rx(θ3) Ry(θ4) Ry(θ1) Ry(θ3) Ry(θ5)

Ry(θ2) Ry(θ5) Rz(θ6) Ry(θ2) Ry(θ5) Rx(θ6) Ry(θ2) Ry(θ4) Ry(θ6)

(e) Convolution 5 (f) Convolution 6 (g) Convolution 7

Rx(θ1) Rz(θ3) Rz(θ5) Rx(θ7) Rz(θ9) Rx(θ1) Rz(θ3) Rx(θ5) Rx(θ7) Rz(θ9)

Rx(θ2) Rz(θ4) Rz(θ6) Rx(θ8) Rz(θ10) Rx(θ2) Rz(θ4) Rx(θ6) Rx(θ8) Rz(θ10)
(h) Convolution 8 (i) Convolution 9

U(θ1, ϕ2, λ3) Ry(θ7) Ry(θ9) U(θ10, ϕ11, λ12) U(θ1, ϕ2, λ3)

U(θ4, ϕ5, λ6) Rz(θ8) U(θ13, ϕ14, λ15) U(θ4, ϕ5, λ6) U(θ7, ϕ8, λ9)

(j) Convolution 10 (k) Convolution 11

Figure 2: Parameterized quantum circuits used in the convolutional layer. The convolutional circuits from (b) to (j) are adapted
from Ref. [20], whereas (a) and (k) are the new convolutional circuits tested in this study. Ri (θ) is the rotation around the
i-axis of the Bloch sphere by an angle of θ, and H is the Hadamard gate. U (θ, ϕ, λ) is an arbitrary single-qubit gate, which can
be expressed as U (θ, ϕ, λ) = Rz (ϕ)Rx (−π/2)Rz (θ)Rx (π/2)Rz (λ). U (θ, ϕ, λ) can implement any unitary operation in SU (2).
As (j) can express an arbitrary two-qubit unitary gate, we test it without any parameterized gates for pooling in addition to ZX
pooling and generalized pooling. For (k), we do not apply parameterized gates for pooling. In these cases, pooling simply traces
out the top qubit after convolution.

building blocks for the convolution layer, resulting in circuits (c), (d), (e), (f), (h), and (i). Circuits (h) and
(i) are two-qubit versions of the circuits with the best expressibility, while circuit (c) is of the best entangling
circuit. Circuits (d), (e), and (f) represent a good balance of expressibility and entangling capability. Circuit
(g) is used for the two-body variational quantum eigensolver entangler [30] and can generate arbitrary
SO(4) gates [31], making it a suitable candidate for building convolution layer in QCNN. Circuit (j) is a
parameterized arbitrary SU (4) gate [19, 32] capable of performing arbitrary two-qubit unitary operations.
Because the convolutional operations act on two qubits, parameterized SU (4) operations provides the most
general ansatz. In this study, we introduce two new convolutional ansatzes, (a) and (k), to our benchmark.
The former aims to study the classification capability of a QCNN when only pooling operations are trained.
The latter is inspired by the generalized pooling operation described in the following paragraph, with an
SU (2) gate applied to a control qubit to split the subspaces in an arbitrary superposition.
The pooling ansatzes used in previous studies were simple single-qubit-controlled rotations followed by
tracing out the control qubit. For example, in Ref. [20], a pooling operation in the following form was used:

TrA (|1⟩⟨1|A ⊗ Rz (θ1 )B + |0⟩⟨0|A ⊗ Rx (θ2 )B ) ρAB Up† ,


 
(1)

where TrA (·) represents a partial trace over subsystem A, Ri (θ) is the rotation around the i axis of the Bloch
sphere by an angle of θ, θ1 and θ2 are the free parameters, ρAB is a two-qubit state subject to pooling, and
4
1 2 3 Trace out

U3(θ4, ϕ5, λ6) U3(θ7, ϕ8, λ9)

Rz(θ1) Rx(θ2) U(θ1, ϕ2, λ3) U(θ4, ϕ5, λ6)

(a) ZX Pooling (b) Generalized Pooling

Figure 3: Parameterized quantum gates used in the pooling layer. The pooling circuit (a) is adapted from Ref. [20], and (b)
is the generalized pooling method introduced in this work. Generalized pooling applies two arbitrary single-qubit unitary
gate rotations, U (θ1 , ϕ2 , λ3 ) and U (θ4 , ϕ5 , λ6 ), which are activated when the control qubit is 1 (filled circle) or 0 (open circle),
respectively. The control (first) qubit is traced out after the gate operations to reduce the dimensions. The single-qubit unitary
gate is defined as U (θ, ϕ, λ) = Rz (ϕ)Rx (−π/2)Rz (θ)Rx (π/2)Rz (λ), and it can implement any unitary in SU (2). The thinner
horizontal line (top qubit) indicates the qubit that is being traced out after gate operations.

Up† is the conjugate transpose of the unitary gate for pooling. The pooling operation in Eq. (1) is referred to
as ZX pooling. In addition to ZX pooling, generalized pooling is introduced as

TrA (|1⟩⟨1|A ⊗ U (θ1 , ϕ2 , λ3 )B + |0⟩⟨0|A ⊗ U (θ4 , ϕ5 , λ6 )B ) ρAB Up† .


 
(2)

Here, U (θ, ϕ, λ) = Rz (ϕ)Rx (−π/2)Rz (θ)Rx (π/2)Rz (λ) and can implement any unitary operator in SU (2).
Again, Up† is the conjugate transpose of the corresponding unitary gate for pooling. The unitary gates used
in ZX pooling and generalized pooling are compared in Fig. 3.

2.2. Transfer learning


Transferring the knowledge accumulated from one task to another is a typical intelligent behavior that
human learners always experience. TL refers to the application of this concept in ML. Specifically, TL aims
to improve the training of a new ML model by utilizing a reference (or source) ML model that is pre-trained
for a different but related task with a different dataset [23–26]. Transfer learning encompasses three main
categories: inductive transfer learning (ITL), transductive transfer learning (TTL), and unsupervised transfer
learning (UTL) [24, 33]. ITL applies when label information is available for the target domain, while TTL
applies when label information is only available for the source domain. UTL, on the other hand, applies when
label information is unavailable for both the source and target domains. In this study, we have chosen to focus
exclusively on ITL to ensure simplicity and clarity in our explanations and demonstrations. Henceforth, when
we refer to transfer learning, it pertains specifically to ITL. Detailed information regarding our numerical
simulations will be presented later in the manuscript.
TL is known to be particularly useful for training a deep learning model that takes a long time owing
to the large amount of data, especially if the features extracted in early layers are generic across various
datasets. In such cases, starting from a pre-trained network such that only a portion of the model parameters
is fine-tuned for a particular task can be more practical than training the entire network from scratch. For
example, suppose that a neural network is trained with data A to solve task A and finds the set of parameters
(i.e., weights and biases) wA ∈ RNA . To solve task B given dataset B, the neural network is not trained
from scratch, as this may require vast computational resources. Instead, the parameters associated with
some of the earlier layers of the reference neural network are used as a set of fixed parameters for the new
neural network that is subjected to solving task B with data B. In other words, some elements of the
parameters for this new learning problem, denoted by wB ∈ RNB , are identical to those of wA . Hence the
number of parameters subject to new optimization is less than NB . The successful application of TL can
improve training performance by starting from a higher training accuracy, achieving a faster rate of accuracy
improvement, and converging to a higher asymptotic training accuracy [34].
The aforementioned observations imply that TL is also beneficial when the amount of data available
is insufficient to work with or extremely small to build a good model. Because processing big data in the
NISQ era will be challenging, working with small amounts of data through TL is a promising strategy for
near-term quantum ML. The target ML model subjected to fresh training (i.e., fine-tuning) in TL typically

5
has a much smaller number of parameters than the pre-trained model. This and the success of QCNN in the
few-parameter regime together promote the development of the classical-to-quantum CNN transfer learning.

3. Classical-to-quantum transfer learning

An extension of TL to quantum ML was proposed, and its general concept was formulated in Ref. [27].
Although the performances of the quantum models were not compared with those of their classical counterparts,
three different scenarios of quantum TL, namely C2Q, quantum-to-classical, and quantum-to-quantum, were
shown to be feasible. Among these three possible scenarios, we focus on C2Q TL as mentioned in Section 1,
because we aim to utilize QCNNs in the few-parameter regime to the full extent. Sufficient reduction of
the data dimensionality (i.e., the number of attributes or features) by classical learning would ensure that
the size of a quantum circuit subject to training is sufficiently small for implementation with NISQ devices.
The dimensionality reduction technique is also necessary to simplify expensive quantum state preparation
routines to represent classical data in a quantum state [35–43].
C2Q TL has been utilized for image data classification [27, 44] and spoken command recognition [45].
These works serve as proof of principle for the general idea and present interesting examples to motivate
further investigations and benchmarks. The parameterized quantum circuits therein are vulnerable to the
barren plateau problem, because they follow the basic structure of a fully connected feedforward neural
network with the same number of input and output qubits. Moreover, these studies used classical neural
networks to significantly reduce the number of data features to only four or eight. This means that most of
the feature extraction is performed classically; hence, the necessity of the quantum part is unclear. These
studies encode the reduced data onto a quantum circuit using simple single-qubit rotations, also known as
qubit encoding [15, 42], which makes the number of model parameters grow polynomially with the number of
data features. In contrast, the number of model parameters in our ML algorithm scales logarithmically with
the number of input qubits. Furthermore, all of these works use only one type of ansatz based on repetitive
applications of single-qubit rotations and controlled-NOT gates. Finally, the performance of C2Q TL was
not compared with that of the C2C version in any of these studies. Because the pre-trained classical neural
network performs a significant dimensionality reduction (and hence feature extraction), the absence of a
direct comparison with C2C TL raises the question of whether the quantum model achieves any advantage
over its classical counterparts.
In this study, we present a classical-to-quantum transfer learning framework with QCNN. Our framework
facilitates the transfer of knowledge from a pre-trained classical CNN to a quantum CNN, leveraging the
unique advantages offered by QCNNs. The adoption of QCNNs as the target model holds crucial importance
for several reasons. Firstly, QCNNs possess the capability to circumvent the barren plateau effect, a critical
bottleneck encountered during the training of quantum neural networks. This property of QCNNs addresses
a major challenge in quantum machine learning and enhances the training process. Furthermore, previous
research has demonstrated the advantages of QCNNs over their classical counterparts, particularly in the
few-parameter regime, along with their good generalization capabilities. As a result, fine-tuning a machine
learning model using a QCNN is expected to yield enhanced classification performance compared to fine-tuning
with a traditional CNN. To illustrate the practical implementation of our C2Q TL framework, we provide
an example of transfer learning using C2Q-CNN, which serves as the basis for our benchmark studies. The
schematic representation of this process can be found in Fig. 4, showcasing the application of our proposed
framework.
The general model is flexible with the choice of data encoding, which loads classical data features to a
quantum state |ψ⟩d , and ansatz, the quantum circuit model subject to training. We performed extensive
benchmarking over the various ansatzes presented in Section 2.1 to classify MNIST data using a classical
model pre-trained with Fashion-MNIST data. Finally, we compared the classification accuracies of C2Q and
various C2C models. The C2Q models performed noticeably better than all C2C models tested in this study
under similar training conditions. More details on the simulation and results are presented in the following
section.

6
Pre-trained CNN
2D conv.
Max pool Dense Dense
Max pool
Input

2D conv.
2D conv. BN
(a) Fashion MNIST
Fashion-MNIST (b) MNIST

Transfer

QCNN

| ψ⟩d
(a) Fashion MNIST MNIST
(b) MNIST

Figure 4: An example of classical-to-quantum convolutional neural network transfer learning simulated in this work for
benchmarking and comparison to purely classical models. A source CNN is trained on the Fashion-MNIST dataset. Then, the
transfer learning trains a QCNN for MNIST data classification by utilizing the earlier layers of the pre-trained CNN for feature
extraction. The source CNN contains convolution (conv.), pooling, dense and batch normalization (BN) layers.

4. Simulation Results

To demonstrate the advantage of C2Q-CNN, we performed classical simulations of binary classification


using PennyLane [28]. The benchmark was performed using two standard image datasets, MNIST and
Fashion-MNIST, which were accessed through Keras [46]. Examples of the datasets are shown on the left in
Fig. 4. Note that both datasets have 28×28 features and 10 classes. Among the 10 classes of MNIST data,
we performed three independent binary classification tasks aimed at distinguishing between 0 and 1, between
2 and 3, and between 8 and 9. To represent classical data as a quantum state in a QCNN, the classical data
must be encoded into a quantum state. The number of data features that can be encoded in N qubits ranges
from N to 2N depending on the choice of the encoding method [15, 40, 42, 43, 47]. Among many options,
we used amplitude encoding to represent as many features as possible (see Appendix A). All C2Q-CNN
simulations were performed with eight input qubits, to which the amplitude encoding loads 256 features.
Quantum feature maps that encode only eight features, such as qubit encoding, are not considered because
they require extreme dimensionality reduction on the classical end, which may dominate the classification
result as described in the previous section. Furthermore, amplitude encoding was shown to work well with
QCNNs for classical data classification [20].
The source classical CNN model, depicted in Fig. 4, was trained on 60,000 Fashion-MNIST data for
multinomial classification. The TL is implemented by replacing the final dense layer of the pre-trained CNN
with a QCNN. The pre-trained CNN is utilized as the source model to perform binary classification on
the MNIST dataset. The source model takes the MNIST data as input and outputs 256 values using the
parameters pre-trained on the Fashion-MNIST dataset. These output values are then encoded into 8 input
qubits of the QCNN, which is fine-tuned using 10,000 MNIST data samples. The fine-tuning process adjusts
the parameters of the QCNN to optimize its performance on the binary classification task. The trainable
parameters in the QCNN were optimized by minimizing the cross-entropy cost function with the Adam
optimizer [48] using PennyLane [28]. The number of MNIST test data was approximately 2,000.
C2Q transfer models can be split into three sets based on different pooling variations. The first set uses
ZX pooling with convolution circuits (a)-(j), as shown in Fig. 2. The second set includes the generalized
pooling circuit and convolution circuits (a)-(j), as shown in Fig. 2. Finally, we constructed transfer models

7
without parameterized quantum gates in pooling layers. We refer to this pooling strategy that merely traces
out one of the qubits as trivial pooling. Trivial pooling is tested with convolution circuits (j) and (k), as
shown in Fig. 2.
To compare C2Q-CNN TL classification results with its classical counterparts, 1D and 2D CNN C2C
TL models were constructed with a similar number of trainable parameters as C2Q-CNN models. The 1D
CNN model was composed of a 1D convolution layer and 1D max pooling layer with 64 trainable parameters.
Similarly, the 2D CNN model was composed of a 2D convolution layer and 2D max pooling layer with 76
trainable parameters. The CNNs subjected to fine-tuning for the MNIST data use the cross-entropy cost
function with the Adam optimizer, as in the C2Q-CNN case. These classical CNN architectures are built
using Keras [46]. Detailed descriptions of C2C models are provided in Appendix B. The training process
used mini-batch gradient descent with aAdambatch size of 50 and a learning rate of 0.01. We also fixed the
number of training iterations at 200 for the C2Q TL and C2C TL models. The other training conditions
were kept the same in the C2Q and C2C transfer models to make the comparison as fair as possible.

(a) 0 vs 1 (b) 2 vs 3 (c) 8 vs 9

Figure 5: Summary of the classification results with PennyLane simulations (quantum part) and Keras (classical part). Each
bar represents the classification test accuracy of C2Q TL averaged over 10 instances given by the random initialization of
parameters. The different bars along the x-axis indicate that the results are for different convolution ansatz, labeled according
to Fig. 2. The unfilled, filled, and hatched bars represent the results of ZX pooling, generalized pooling, and trivial pooling,
respectively. The number of trainable model parameters for each case is shown at the top of the x-axis. The horizontal lines
represent the results of the C2C TL with 1D and 2D CNN architectures. The number of trainable model parameters for each
case is provided in the legend.

The TL classification results are shown in Fig. 5. Each bar represents the C2Q classification accuracy
averaged over ten randomly initialized parameters. Different bars along the x-axis represent different
convolutions, labeled according to Fig. 2. The unfilled, filled, and hatched bars represent the results of ZX
pooling, generalized pooling, and trivial pooling, respectively. The blue dashed line and green solid line
represent the results of the C2C TL using 1D and 2D CNN architectures, respectively.
The 0 vs. 1 classification results are shown in Fig. 5 (a). For ZX and generalized pooling, most of the
average test accuracies were greater than 95% and 98%, respectively. The test accuracy with ZX pooling
with convolution 2, 4, 7, 8, 9, and 10 ansatz and generalized pooling with all convolution ansatz is greater
than that of 1D and 2D C2C TL. The accuracy with trivial pooling with convolution 10 ansatz was also
higher than that of C2C TL. In addition, the accuracy of trivial pooling with convolution ansatz 10 is higher
than that of trivial pooling with convolution ansatz 11. This can be attributed to the fact that the ansatz
10 has more parameters and is capable of expressing an arbitrary SU (4) including the ansatz 11. The test
accuracy of generalized pooling is greater than that of ZX pooling when the convolution ansatz is the same.
This can be inferred from the fact that generalized pooling can be trained to learn ZX pooling.
The 2 vs. 3 classification results are shown in Fig. 5 (b). Most of the ZX pooling average accuracies
were between 70% and 90%, and most of the generalized pooling average accuracies were between 85% and
8
90%. The test accuracies in (b) are lower than those of (a) because 2 vs. 3 image classifications are more
difficult than 0 vs. 1 image classifications. All C2Q TL classification accuracies are higher than the C2C
classification accuracies except for ZX pooling with convolution 1 and 3, which use a much smaller number
of parameters than the purely classical TL. The test accuracy of generalized pooling is greater than that of
ZX pooling when the convolution is the same. In addition, the accuracy of trivial pooling with convolution
ansatz 10 is higher than that with ansatz 11. These results are consistent with those obtained from the 0
vs. 1 classification, providing further evidence that the ansatz with more model parameters and improved
expressibility is favored. This suggests that the use of more complex models, such as the generalized pooling,
is advantageous in solving classification problems, particularly in comparison to less expressive models such
as ZX pooling.
The 8 vs. 9 classification results are shown in Fig. 5 (c). Most of the ZX pooling average accuracies
were between 70% and 90%, and most of the generalized pooling average accuracies were between 85% and
90%. The test accuracies in (c) are lower than those in (a) because 8 vs. 9 image classification is more
difficult than 0 vs. 1 image classification, but the accuracies in (c) are similar to those in (b). All C2Q TL
classification accuracies are higher than the C2C TL classification accuracies except for ZX pooling with
convolution 1, 2, and 3, which use a much smaller number of parameters than the purely classical TL. As
before, trivial pooling with convolution ansatz 10 has higher accuracy than trivial pooling with ansatz 11.
the accuracy of generalized pooling is consistently higher than that of ZX pooling when the underlying
convolution is identical. However, there are instances where the performance of ZX pooling is comparable
to that of generalized pooling, such as with the use of convolution ansatz 7. Nevertheless, a Welch’s t-test
analysis [49] revealed that these results are not statistically significant. Based on these findings, we can
conclude that generalized pooling is a more favorable approach for solving all of the classification problems
tested, compared to ZX pooling.
The results in Fig. 5 (a), (b), and (c) show a QCNN’s tendency to perform better when the convolution
circuits have a larger number of trainable parameters. However, simply increasing the number of trainable
parameters does not always guarantee to improve the test accuracy because it is affected by various conditions,
such as statistical error and quantum gate arrangement. For example, ZX pooling with convolution ansatz
5, 6, 7 in (b) has the same number of trainable parameters, but their average accuracies are different. In
the current study, overfitting was not observed as the number of model parameters was much smaller than
the number of data, but it is important to keep this issue in mind when designing and implementing larger
models in the future.
In summary, generalized pooling mostly produces higher classification accuracy than ZX pooling with
the same convolution circuit. This is as expected since the generalized pooling has more model parameters.
Moreover, it can be reduced to ZX pooling under appropriate parameter selection. The accuracy of all ZX
pooling, generalized pooling, and trivial pooling circuits tends to be higher when the convolution circuits
have a larger number of gate parameters. Although C2Q models have fewer trainable parameters than C2C
models, most C2Q models outperform C2C models.
To validate our findings, we conducted a Welch’s t-test analysis [49] to determine the statistical significance
of the improved classification results obtained by the C2Q models. Our results show that ZX pooling with
convolution ansatz 9 and generalized pooling with convolution ansatz 4, 5, 9, and 10 have a statistically
significant quantum advantage over both 1D and 2D classical models for all classification problems, despite
having a smaller number of model parameters. Further details on the statistical analysis can be found in
Appendix C. These findings underscore the potential of quantum-enhanced machine learning models in
solving complex classification tasks, even with limited model resources.
The underlying source of the quantum advantage in quantum computing remains an open question.
However, it is speculated that the advantage is related to certain properties of quantum computing that
have no classical equivalent. The first property is the ability of quantum measurements to discriminate
non-orthogonal states, which enables quantum computers to capture subtle differences in data that are not
captured by classical computers. The second property is the ability of quantum convolutional operations to
create entanglement among all qubits through the use of two-qubit gates between nearest neighbors, which
allows for the capture of non-local correlations. In addition, the ability of a quantum computer to store
N -dimensional data in ⌈log2 (N )⌉ qubits, and the ability of the QCNN to classify M -qubit quantum states
9
using only O(log(M )) parameters, make it possible to construct an extremely compact machine learning
model.

5. Conclusion

In this study, we proposed a classical-to-quantum CNN (C2Q-CNN), a transfer learning (TL) model
that uses some layers of a pre-trained CNN as a starting point for a quantum CNN (QCNN). The QCNN
constitutes an extremely compact machine learning (ML) model because the number of trainable parameters
grows logarithmically with the number of initial qubits [4] and is promising because of the absence of barren
plateaus [16] and generalization capabilities [18]. Supervised learning with a QCNN has also demonstrated
classification performance superior to that of its classical counterparts under similar training conditions for a
number of canonical datasets [20]. C2Q-CNN TL provides an approach to utilize the advantages of QCNN in
the few-parameter regime to the full extent. Moreover, the proposed method is suitable for implementation in
quantum hardware expected to be developed in the near future because it is robust to systematic errors and
can be implemented with a shallow-depth quantum circuit. Therefore, C2Q-CNN TL is a strong candidate
for practical applications of NISQ computing in ML with a quantum advantage.
To demonstrate the quantum advantage of C2Q-CNN, we conducted a comparative study between two
classical-to-classical (C2C) transfer learning (TL) models and C2Q TL models. The C2C and C2Q TL
models shared the same pre-trained CNN, with the C2C TL models having slightly more parameters than
the C2Q TL models. The pre-training was performed on the Fashion-MNIST dataset for multinomial
classification. Then the target model replaced the final dense layer of the source model and was trained for
three independent binary classification tasks using the MNIST data. Our simulation results, obtained using
PennyLane and Keras, revealed that the C2Q models consistently achieved higher classification accuracy
compared to the C2C models, despite having fewer trainable parameters. These results highlight the potential
of quantum-enhanced transfer learning in improving the performance of machine learning models. It is
important to note that while our simulation utilized a source CNN specifically designed for this study, the
C2Q-CNN TL framework is compatible with other existing CNNs, such as VGGNet [50], ResNet [51], and
DenseNet [52]. This compatibility enhances the versatility of our approach, enabling researchers to leverage
established CNN designs within the quantum-enhanced transfer learning paradigm.
The potential future research directions are as follows. First, the reason behind the quantum advantage
demonstrated by C2Q-CNN remains unclear. Although rigorous analysis is lacking, we speculate that this
advantage is related to the ability of a quantum measurement to discriminate non-orthogonal states, for which
a classical analog does not exist. Moreover, verifying whether the quantum advantage would continue to hold
as the number of trainable parameters increases and for other datasets would be interesting. To increase the
number of model parameters for a fixed number of features and input qubits, one may consider generalizing
the QCNN model to utilize multiple channels, as in many classical CNN models. Note that, in the TL tested
in our experiment, the final dense layer was replaced with a model subjected to fine-tuning, while the entire
convolutional part was frozen. Testing the various depths of frozen layers would be an interesting topic
for future research. For example, freezing a smaller number of layers to use the features of an earlier layer
of the convolutional stage can be beneficial when the new dataset is small and significantly different from
the source. The focus of this study was on inductive TL, for which both the source and new datasets were
labeled. Exploring the potential of leveraging quantum techniques in other TL scenarios, such as self-taught,
unsupervised, and transductive TL [24], is a promising direction for future research. Furthermore, extending
the C2Q TL approach to address other machine learning problems, such as semi-supervised learning [53, 54]
and one-class classification [55, 56], poses an open challenge for future research.

Data Availability

The data that support the findings of this study, including additional benchmarking examples not
explicitly mentioned in the manuscript, are available upon request.

10
Acknowledgments

This research was supported by the Yonsei University Research Fund of 2022 (2022-22-0124), the National
Research Foundation of Korea (Grant Nos. 2021M3H3A1038085, 2019M3E4A1079666, 2022M3E4A1074591,
2022M3H3A1063074 and 2021M3E4A1038308), and the KIST Institutional Program (2E32241-23-010).

Appendix A. Encoding classical data to a quantum state

The first step in applying quantum ML to a classical dataset is to transform the classical data into a
quantum state. Without loss of generality, we consider the classical data given as an N -dimensional real
vector ⃗x ∈ RN . Several encoding methods exist to achieve this, such as algorithms that require a quantum
circuit with O(N ) width and O(1) depth and algorithms that require O(log(N )) width and O(poly(N ))
depth [40–43]. Among the various encoding methods explored previously [20], we observed that amplitude
encoding performs best in most cases.

Appendix A.1. Amplitude encoding


Amplitude encoding encodes classical data into the probability amplitude of each computational quantum
state. Amplitude encoding transforms x = (x1 , ..., xN )⊤ of dimension N = 2n classical data into an n-qubit
quantum state |ψ(x)⟩ as follows:
N
1 X
U (x) : x ∈ RN →
− |ψ(x)⟩ = xi |i⟩ , (A.1)
||x|| i=1

where |i⟩ denotes the ith computational basis state. Amplitude encoding can optimize the number of
parameters on the O(log(N )) scale. However, the quantum circuit depth of amplitude encoding typically
increases with O(poly(N )).

Appendix A.2. Qubit encoding


Qubit encoding uses a constant quantum circuit depth, while using N qubits. Qubit encoding rescales
classical data to xi which lies between 0 and π, and then inputs xi into a single qubit as |ψ(x)⟩ =
cos ( x2i ) |0⟩ + sin ( x2i ) |1⟩ for i = 1, ..., N . Therefore, qubit encoding transforms x = (x1 , ..., xN )⊤ into N
qubits as
N  x  x  
i i
O
N
U (x) : x ∈ R → − |ψ(x)⟩ = cos |0⟩ + sin |1⟩ (A.2)
i=1
2 2

where xi ∈ [0, π) for all i. This unitary operator U (x) can be expressed as the tensor product of a single
qubit unitary operator U (x) = ⊗N j=1 Uxj , where
" xj  xj 
#
−i
xj
σy cos 2 − sin 2
Uxj = e 2 = xj  xj  . (A.3)
sin 2 cos 2

Appendix B. Classical Neural Network

We devised classical convolutional neural networks to compare C2C TL and C2Q TL under the same
training conditions. In particular, we assigned the similar number of model parameters to C2C TL and C2Q
TL. We created 1D and 2D CNN models for the C2C TL using Keras [46]. We used the same pre-trained
CNN model with C2Q TL, which is introduced in Fig. 4. To compare the test accuracies of C2C and C2Q
transfer models under the same conditions, the learning rate, batch size, and number of iterations were fixed,
and the Adam optimizer was used.

11
Input
1D convolution
Dense layer
Max pooling

Figure B.6: 1D CNN model

Input

Reshape 2D convolution
Max pooling Max pooling

Dense layer
2D convolution

Figure B.7: 2D CNN model

Appendix B.1. 1D CNN


The structure of the 1D CNN model is illustrated in Fig. B.6. The CNN takes 256 features produced by
the source (pre-trained) CNN as input, and passes them on to 1D convolution and max pooling layers. The
output feature size is reduced to 28 at the end of the max pooling layer. Finally, a dense layer is applied
to reduce the size of the output features to two for binary classification. The total number of trainable
parameters is 64.

Appendix B.2. 2D CNN


The structure of the 2D CNN model is shown in Fig. B.7. The CNN takes 16 × 16 data, which is reshaped
from 256 features produced by the source (pre-trained) CNN as input. This two-dimensional data is passed
on to the 2D convolution and max pooling layer twice, and the output features are reduced to eight. Finally,
a dense layer is applied to reduce the size of the output features to two for binary classification. The total
number of trainable parameters is 76.

Appendix C. Welch’s t-test

The Welch’s t-test [49] is a widely-utilized method for assessing the equality of means between two
populations with unequal variances. In our study, the Welch’s t-test was implemented using SciPy [57] to
obtain p-values between the C2Q TL model and the C2C TL models. In accordance with standard statistical
practice [58, 59], a p-value less than α, where α is typically set to 0.05, is considered to indicate statistical
significance. Based on the obtained p-values, we conclude that if a C2Q TL model demonstrates statistical
significance compared to the C2C TL models and achieves a higher accuracy, it can be inferred to possess a
12
meaningful advantage (quantum advantage). The p-value results were organized into groups based on the
ZX pooling, generalized pooling, and trivial pooling approaches, as previously discussed in the paper. For
each pooling type, the p-value results were further grouped by the type of learning problem, which in this
case was binary classification for 0 and 1, 2 and 3, and 8 and 9. The results of ZX pooling are presented in
Tables C.1, C.2, and C.3 for classifying between 0 and 1, 2 and 3, and 8 and 9, respectively. The results
of generalized pooling are presented in Tables C.4, C.5, and C.6 in the same order. Finally, the results of
trivial pooling are listed in Table C.7. We highlighted in bold any statistically significant p-values where the
corresponding C2Q model exhibited higher accuracy than the C2C model.

convolution circuit 1 2 3 4 5 6 7 8 9 10
1D 64 p-value 0.0312 0.1289 0.7327 0.0793 0.1795 0.2957 0.0316 0.0240 0.0166 0.0422
2D 76 p-value 0.0047 0.8761 0.0323 0.5240 0.9725 0.5402 0.1348 0.0888 0.0353 0.2441

Table C.1: ZX pooling with the 0 vs. 1 classification p-value results.

convolution circuit 1 2 3 4 5 6 7 8 9 10
1D 64 p-value 0.0768 0.9352 0.9927 0.0497 0.2332 0.1700 0.0304 0.3209 0.0446 0.0119
2D 76 p-value 0.0289 0.6896 0.7954 0.0047 0.0579 0.0375 0.0023 0.0985 0.0040 0.0006

Table C.2: ZX pooling with the 2 vs. 3 classification p-value results.

convolution circuit 1 2 3 4 5 6 7 8 9 10
1D 64 p-value 0.0008 0.2373 0.4708 0.0035 0.0024 0.0012 0.0007 0.0010 0.0004 0.0002
2D 76 p-value 0.0000 0.2806 0.0495 0.1591 0.0904 0.0204 0.0117 0.0085 0.0015 0.0002

Table C.3: ZX pooling with the 8 vs. 9 classification p-value results.

convolution circuit 1 2 3 4 5 6 7 8 9 10
1D 64 p-value 0.1079 0.0215 0.0302 0.0169 0.0139 0.0297 0.0217 0.0206 0.0151 0.0103
2D 76 p-value 0.7661 0.0535 0.1171 0.0308 0.0197 0.1026 0.0598 0.0609 0.0268 0.0107

Table C.4: Generalized pooling with the 0 vs. 1 classification p-value results.

convolution circuit 1 2 3 4 5 6 7 8 9 10
1D 64 p-value 0.0921 0.0290 0.0389 0.0106 0.0099 0.0138 0.0109 0.0074 0.0068 0.0048
2D 76 p-value 0.0129 0.0021 0.0034 0.0006 0.0006 0.0008 0.0006 0.0004 0.0003 0.0002

Table C.5: Generalized pooling with the 2 vs. 3 classification p-value results.

convolution circuit 1 2 3 4 5 6 7 8 9 10
1D 64 p-value 0.0080 0.0079 0.0013 0.0009 0.0012 0.0010 0.0016 0.0004 0.0007 0.0002
2D 76 p-value 0.4616 0.4574 0.0202 0.0085 0.0261 0.0062 0.0279 0.0010 0.0038 0.0002

Table C.6: Generalized pooling with the 8 vs. 9 classification p-value results.

13
Classification 0 vs 1 2 vs 3 8 vs 9
Convolution 10 11 10 11 10 11
1D 64 p-value 0.0340 0.9678 0.0602 0.1097 0.0002 0.0005
2D 76 p-value 0.1305 0.1192 0.0065 0.0173 0.0003 0.0048

Table C.7: Trivial pooling p-value results.

References
[1] Jonathan Romero, Jonathan P Olson, and Alan Aspuru-Guzik. Quantum autoencoders for efficient compression of quantum
data. Quantum Science and Technology, 2(4):045001, aug 2017.
[2] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii. Quantum circuit learning. Phys. Rev. A, 98:032309, Sep 2018.
[3] Marcello Benedetti, Erika Lloyd, Stefan Sack, and Mattia Fiorentini. Parameterized quantum circuits as machine learning
models. Quantum Science and Technology, 4(4):043001, nov 2019.
[4] Iris Cong, Soonwon Choi, and Mikhail D. Lukin. Quantum convolutional neural networks. Nature Physics, 15(12):1273–1278,
December 2019.
[5] M. Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C. Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R. McClean,
Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, and Patrick J. Coles. Variational quantum algorithms. Nature Reviews Physics,
3(9):625–644, 2021.
[6] S. Mangini, F. Tacchino, D. Gerace, D. Bajoni, and C. Macchiavello. Quantum computing models for artificial neural
networks. EPL (Europhysics Letters), 134(1):10002, April 2021.
[7] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, and Dacheng Tao. Learnability of quantum neural networks. PRX
Quantum, 2:040337, Nov 2021.
[8] Jun Li, Xiaodong Yang, Xinhua Peng, and Chang-Pu Sun. Hybrid quantum-classical approach to quantum optimal control.
Phys. Rev. Lett., 118:150503, Apr 2017.
[9] Maria Schuld, Ville Bergholm, Christian Gogolin, Josh Izaac, and Nathan Killoran. Evaluating analytic gradients on
quantum hardware. Phys. Rev. A, 99:032331, Mar 2019.
[10] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J. Love, Alán Aspuru-Guzik, and
Jeremy L. O’Brien. A variational eigenvalue solver on a photonic quantum processor. Nature Communications, 5(1):4213,
July 2014.
[11] Jarrod R McClean, Jonathan Romero, Ryan Babbush, and Alán Aspuru-Guzik. The theory of variational hybrid
quantum-classical algorithms. New Journal of Physics, 18(2):023023, feb 2016.
[12] John Preskill. Quantum Computing in the NISQ era and beyond. Quantum, 2:79, August 2018.
[13] Kishor Bharti, Alba Cervera-Lierta, Thi Ha Kyaw, Tobias Haug, Sumner Alperin-Lea, Abhinav Anand, Matthias Degroote,
Hermanni Heimonen, Jakob S. Kottmann, Tim Menke, Wai-Keong Mok, Sukin Sim, Leong-Chuan Kwek, and Alán
Aspuru-Guzik. Noisy intermediate-scale quantum algorithms. Rev. Mod. Phys., 94:015004, Feb 2022.
[14] Jarrod R. McClean, Sergio Boixo, Vadim N. Smelyanskiy, Ryan Babbush, and Hartmut Neven. Barren plateaus in quantum
neural network training landscapes. Nature Communications, 9(1):4812, Nov 2018.
[15] Edward Grant, Marcello Benedetti, Shuxiang Cao, Andrew Hallam, Joshua Lockhart, Vid Stojevic, Andrew G. Green, and
Simone Severini. Hierarchical quantum classifiers. npj Quantum Information, 4(1):65, December 2018.
[16] Arthur Pesah, M. Cerezo, Samson Wang, Tyler Volkoff, Andrew T. Sornborger, and Patrick J. Coles. Absence of barren
plateaus in quantum convolutional neural networks. Phys. Rev. X, 11:041011, Oct 2021.
[17] Rui Huang, Xiaoqing Tan, and Qingshan Xu. Variational quantum tensor networks classifiers. Neurocomputing, 452:89–98,
2021.
[18] Leonardo Banchi, Jason Pereira, and Stefano Pirandola. Generalization in quantum machine learning: A quantum
information standpoint. PRX Quantum, 2:040321, Nov 2021.
[19] Ian MacCormack, Conor Delaney, Alexey Galda, Nidhi Aggarwal, and Prineha Narang. Branching quantum convolutional
neural networks. Phys. Rev. Research, 4:013117, Feb 2022.
[20] Tak Hur, Leeseok Kim, and Daniel K. Park. Quantum convolutional neural network for classical data classification.
Quantum Machine Intelligence, 4(1):3, 2022.
[21] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of
the IEEE, 86(11):2278–2324, 1998.
[22] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-MNIST: a novel image dataset for benchmarking machine learning
algorithms. arXiv preprint arXiv:arXiv:1708.07747, 2017.
[23] Stevo Bozinovski. Reminder of the first paper on transfer learning in neural networks, 1976. Informatica (Slovenia), 44,
2020.
[24] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering,
22(10):1345–1359, 2010.
[25] Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. A survey on deep transfer learning.
In Věra Kůrková, Yannis Manolopoulos, Barbara Hammer, Lazaros Iliadis, and Ilias Maglogiannis, editors, Artificial Neural
Networks and Machine Learning – ICANN 2018, pages 270–279, Cham, 2018. Springer International Publishing.
[26] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.

14
[27] Andrea Mari, Thomas R. Bromley, Josh Izaac, Maria Schuld, and Nathan Killoran. Transfer learning in hybrid classical-
quantum neural networks. Quantum, 4:340, October 2020.
[28] Ville Bergholm, Josh Izaac, Maria Schuld, Christian Gogolin, M. Sohaib Alam, Shahnawaz Ahmed, Juan Miguel Arrazola,
Carsten Blank, Alain Delgado, Soran Jahangiri, Keri McKiernan, Johannes Jakob Meyer, Zeyue Niu, Antal Száva,
and Nathan Killoran. Pennylane: Automatic differentiation of hybrid quantum-classical computations. arXiv preprint
arXiv:1811.04968, 2018.
[29] Sukin Sim, Peter D. Johnson, and Alán Aspuru-Guzik. Expressibility and Entangling Capability of Parameterized Quantum
Circuits for Hybrid Quantum-Classical Algorithms. Advanced Quantum Technologies, 2(12):1900070, 2019.
[30] Robert M. Parrish, Edward G. Hohenstein, Peter L. McMahon, and Todd J. Martínez. Quantum Computation of Electronic
Transitions Using a Variational Quantum Eigensolver. Physical Review Letters, 122(23):230401, 2019.
[31] Hai-Rui Wei and Yao-Min Di. Decomposition of orthogonal matrix and synthesis of two-qubit and three-qubit orthogonal
gates. Quantum Inf. Comput., 12(3-4):262–270, 2012.
[32] Farrokh Vatan and Colin Williams. Optimal quantum circuits for general two-qubit gates. Physical Review A, 69(3):032315,
2004.
[33] Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. A
Comprehensive Survey on Transfer Learning. Proceedings of the IEEE, 109(1):43–76, 2021.
[34] Emilio Soria Olivas, Jose David Martin Guerrero, Marcelino Martinez Sober, Jose Rafael Magdalena Benedito, and Antonio
Jose Serrano Lopez. Handbook Of Research On Machine Learning Applications and Trends: Algorithms, Methods and
Techniques - 2 Volumes. Information Science Reference - Imprint of: IGI Publishing, Hershey, PA, 2009.
[35] Gui-Lu Long and Yang Sun. Efficient scheme for initializing a quantum register with an arbitrary superposed state. Phys.
Rev. A, 64:014303, Jun 2001.
[36] Andrei N. Soklakov and Rüdiger Schack. Efficient state preparation for a register of quantum bits. Phys. Rev. A, 73:012307,
Jan 2006.
[37] Michele Mosca and Phillip Kaye. Quantum networks for generating arbitrary quantum states. In Optical Fiber Commu-
nication Conference and International Conference on Quantum Information, page PB28. Optical Society of America,
2001.
[38] Martin Plesch and Časlav Brukner. Quantum-state preparation with universal gate decompositions. Phys. Rev. A,
83:032302, Mar 2011.
[39] Mikko Möttönen, Juha J. Vartiainen, Ville Bergholm, and Martti M. Salomaa. Transformation of quantum states using
uniformly controlled rotations. Quantum Info. Comput., 5(6):467–473, September 2005.
[40] Israel F. Araujo, Daniel K. Park, Francesco Petruccione, and Adenilton J. da Silva. A divide-and-conquer algorithm for
quantum state preparation. Scientific Reports, 11(1):6329, March 2021.
[41] T. M. L. Veras, I. C. S. De Araujo, K. D. Park, and A. J. da Silva. Circuit-based quantum random access memory for
classical data with continuous amplitudes. IEEE Transactions on Computers, pages 1–1, 2020.
[42] Ryan LaRose and Brian Coyle. Robust data encodings for quantum classifiers. Phys. Rev. A, 102:032420, Sep 2020.
[43] Israel F. Araujo, Daniel K. Park, Teresa B. Ludermir, Wilson R. Oliveira, Francesco Petruccione, and Adenilton J. da Silva.
Configurable sublinear circuits for quantum state preparation. Quantum Information Processing, 22(2):123, 2023.
[44] Harshit Mogalapalli, Mahesh Abburi, B. Nithya, and Surya Kiran Vamsi Bandreddi. Classical–Quantum Transfer Learning
for Image Classification. SN Computer Science, 3(1):20, January 2022.
[45] Jun Qi and Javier Tejedor. Classical-to-quantum transfer learning for spoken command recognition based on quantum
neural networks. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pages 8627–8631, 2022.
[46] François Chollet et al. Keras. https://keras.io, 2015.
[47] Vojtech Havlícek, Antonio D. Córcoles, Kristan Temme, Aram W. Harrow, Abhinav Kandala, Jerry M. Chow, and Jay M.
Gambetta. Supervised learning with quantum-enhanced feature spaces. Nature, 567(7747):209–212, 2019.
[48] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[49] B. L. Welch. The generalization of ‘student’s’ problem when several different population varlances are involved. Biometrika,
34(1-2):28–35, 01 1947.
[50] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint
arXiv:1409.1556, 2014.
[51] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. 2016 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
[52] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely Connected Convolutional Networks.
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, 2017.
[53] Oliver Chapelle, Bernhard Schölkopf, and Alexander Zien. Semi-supervised Learning. Cambridge:MIT Press, 2006.
[54] Wandong Zhang, Q. M. Jonathan Wu, and Yimin Yang. Semisupervised Manifold Regularization via a Subnetwork-Based
Representation Learning Model. IEEE Transactions on Cybernetics, PP(99):1–14, 2022.
[55] Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel
Müller, and Marius Kloft. Deep one-class classification. In International conference on machine learning, pages 4393–4402.
PMLR, 2018.
[56] Wandong Zhang, Q. M. Jonathan Wu, W. G. Will Zhao, Haojin Deng, and Yimin Yang. Hierarchical One-Class Model
With Subnetwork for Representation Learning and Outlier Detection. IEEE Transactions on Cybernetics, PP(99):1–14,
2022.
[57] Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski,
Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod

15
Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng,
Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R.
Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. SciPy
1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020.
[58] Valen E. Johnson. Revised standards for statistical evidence. Proceedings of the National Academy of Sciences, 110(48):19313–
19317, 2013.
[59] Martin Krzywinski and Naomi Altman. Significance, P values and t-tests. Nature Methods, 10(11):1041–1042, 2013.

16

You might also like