0% found this document useful (0 votes)
47 views27 pages

Insights Into Quantum Support Vector Machine

The document discusses the advantages of Quantum Support Vector Machines (QSVM) over classical Support Vector Machines (SVM) in machine learning, highlighting the use of quantum computing principles to enhance computational power. It explores the implementation of QSVM using IBM Qiskit and classical frameworks, emphasizing the hybrid quantum-classical approach to optimize performance. The paper provides insights into the workings of quantum algorithms, their benefits, and the challenges faced in the current quantum computing landscape.

Uploaded by

Manash sarma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views27 pages

Insights Into Quantum Support Vector Machine

The document discusses the advantages of Quantum Support Vector Machines (QSVM) over classical Support Vector Machines (SVM) in machine learning, highlighting the use of quantum computing principles to enhance computational power. It explores the implementation of QSVM using IBM Qiskit and classical frameworks, emphasizing the hybrid quantum-classical approach to optimize performance. The paper provides insights into the workings of quantum algorithms, their benefits, and the challenges faced in the current quantum computing landscape.

Uploaded by

Manash sarma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Insights into Quantum Support Vector Machine

Subrahmanyam Jammalamadaka
Vellore Institute of Technology University
Aswani Kumar Cherukuri

Vellore Institute of Technology University

Research Article

Keywords: IBM Q-Experience, Machine Learning, Qiskit, Quantum Computing, Quantum Machine learning,
Qubit, Support Vector Machine, Variation Quantum Circuit

Posted Date: June 5th, 2024

DOI: https://doi.org/10.21203/rs.3.rs-4465421/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

Additional Declarations: No competing interests reported.


Insights into Quantum Support Vector Machine
Subrahmanyam Jammalamadaka1 and Aswani Kumar Cherukuri2*†
1 School of Computer Science Engineering & Information Systems,
Vellore Institute of Technology University, Katpadi, Vellore, 632014,
Tamilnadu, INDIA.
2* School of Computer Science Engineering & Information Systems,

Vellore Institute of Technology University, Katpadi, Vellore, 632014,


Tamilnadu, INDIA.

*Corresponding author(s). E-mail(s): [email protected];


Contributing authors: [email protected];
† These authors contributed equally to this work.

Abstract
Machine learning through Quantum computing principles has been found to take
an edge in the artificial intelligence and machine learning domains. Both domains
have similarities in terms of higher-dimensional computing through complex lin-
ear algebra. This has given way to Quantum Machine learning. To work with
available quantum computers to a greater extent, quantum machine learning
algorithms are designed to work on Nearly Intermediate State Quantum (NISQ)
devices using hybrid quantum-classical methods of execution. IBM Qiskit quan-
tum framework with existing classical machine learning frameworks like SK-Learn
and tensor-flow is used for this purpose of the study on how a Quantum sup-
port vector machine algorithm can be more effectively used over the classical
support vector machine when classical support vector machines lack the compu-
tation power. In this paper, an insightful discussion of how quantum principles
are involved in gaining advantages over classical machine learning is provided. To
gain the fullest advantage of present-day quantum computers, a hybrid classical-
quantum-classical mode is used. An in-depth discussion on the pros and cons of
using quantum machine learning for quantum support vector machines is done
with an experiment.

Keywords: IBM Q-Experience, Machine Learning, Qiskit, Quantum Computing,


Quantum Machine learning, Qubit, Support Vector Machine, Variation Quantum
Circuit

1
1 Introduction
Quantum computers are addressed as more powerful computing machines than clas-
sical computing machines. The quantum mechanical principles like Superposition,
Interference, Entanglement,.etc. enable parallelism in the computation of quantum
algorithms [1]. The function execution with variables in a range of values can be exe-
cuted in concurrent mode by the use of quantum computers. Parallelism in execution
can be attained by the quantum mechanical principles. The classical machine exe-
cutes a function in a sequential mode only, this is a limitation of classical computers
over quantum computers. Machine learning(ML), a sub-file of artificial intelligence is
to find the underlying patterns from largely available data sets. These patterns are
learned through a process of training from data sets. This way of learning from data is
called developing a machine learning model. The model developed is used to recognize
these patterns from unseen data. Machine learning can be broadly classified as
1. Supervised learning: learning is happening from the label data set in the building
model to find the patterns from unseen data.
2. Unsupervised learning: learning is happening without label data in building a model
to find the patterns from unseen data.
3. Reinforcement learning: learning is happening through events and experiences.
Each experience is rewarded and with the help of rewards of experiences, the
learning of data is done for building the model[2].
Quantum computing and Machine learning together constitute Quantum machine
learning (QML). Machine learning using quantum mechanical principles on quantum
computers has shown advantageous in solving hard learning problems over classical
machines. Quantum machines are terms to learn in a better way. Quantum machine
learning models are mathematically similar to classical machine learning models
according to Biamonte et al., [3]. These QML algorithms were found to be more
robust in learning to solve hard problems according to Peter Wittek [4].

In the current discussion, the kernel methods for classifiers are taken up for study
with a support vector machine(SVM) model. Support vector machine is considered a
powerful method of supervised machine learning models as a classifier or as a regressor
in solving certain real-world problems [5]. In support vector machines kernel methods
are used as classifiers for both classifications of separable or inseparable data. Kernel
methods help in building data separable models from inseparable data by mapping
the data to a higher dimensional space. Kernel methods provide similarity measures
between the objects. In support vector machines, the margin of separation between
the data is maximized by formulating analytically through constraint optimization.
Applying the kernel method on inseparable data to find the fat margins at higher
dimension space called as kernel trick is done by finding the inner products at a
higher dimensional space without visiting the higher dimensional space as stated by
Rebentrost et al., [6–8]. Classical computers are limited by their computation space
in finding the inner products of vectors at higher dimensional spaces. The quantum
computers are found to be more efficient in addressing this problem of inner products
at higher dimensional space. The process of converting classical data to quantum

2
data is called encoding. This encoding inherently involves a mapping of classical lower
dimensional data to a higher dimensional space called Hilbert space according to Maria
Schuld in [9]. This is one of the key aspects in speedups on how a Quantum support
vector machine can produce exponential speed-up in producing inner products.
The organization of the paper is, in the section 2 which gives a brief understand-
ing of the topics on Quantum computing in 2.1, Quantum Machine learning in 2.2,
Classical Support Vector Machine in 2.3 and Quantum support vector machine in 2.4.
The section 3 explains the algorithmic process of how classical support vector machine
works and when we can use quantum support vector machines with a flow diagram.
In section 4 discussion on the implementation of experimental work on IBM Quantum
machine through the Qiskit framework and classical SVM Sci-Kit Learn is explained.
In section 5 the analysis of the quantum support vector machine implementation on
the quantum machine is compared with classical machines with the data is done. In
the final section 6 conclusion is given which gives an insight into how a QSVM is useful
in building a better classifier for learning.

2 Background
2.1 Quantum Computing
Quantum computers were found to be efficient in computing over classical machines by
gaining quadratic or exponential speed-ups over the best-known classical algorithms.
The fundamental building block of the information processing element in quantum
computers is a Qubit. Qubit is analogous to classical bit. The classical bit has legiti-
mate two states of operation termed ’0’ or ’1’, but a qubit has more states of operation
like ’0’, and ’1’ along with the linear combination of superposition states between |0⟩
and |1⟩. The quantum basis can be {|0⟩ , |1⟩} called the computational basis. Qubit in
general form is |ψ⟩ = α |0⟩ + β |1⟩, where ‘α′ and ‘β ′ are probability amplitudes for
the state given by |α|2 + |β|2 = 1. In a more general form, we can visualize a qubit
with the help of the Bloch sphere as shown in 1 . The visual representation of a single
qubit projection is shown using the Bloch sphere. The qubit state is quantum given
by state |ψ⟩

 
iγ θ θ
|ψ⟩ = e cos |0⟩ + eiφ sin |1⟩
2 2
or
 
θ θ
|ψ⟩ = cos |0⟩ + eiφ sin |1⟩
2 2

where 0 ≤ θ ≤ π, 0 ≤ φ ≤ 2π. The global phase factor eiγ has no measurable


effect on the qubit. On the measurement of the qubit state, the qubit has a probability
of |α|2 = 1 representing state |0⟩ and a probability of |β|2 = 1 representing state |1⟩.

3
In vector notations, the qubit is represented as
   
1 0
0 → |0⟩ = , 1 → |1⟩ =
0 1
 
1
To have an understanding of how the qubit state |0⟩ vector is, can be regarded
0
as the spin up state of an electron called a qubit state representation. The first element
shows the spin probability of the upstate and the second element shows the spin prob-
ability of the downstate. In quantum notations, the |ψ⟩ is used to denote an arbitrary
superposition state of a qubit state given by |ψ⟩ = α |0⟩ + β |1⟩. The superposition
state of a qubit can be shown as a linear combination of this spin-up with a given by
probability vector α and spin-down with a given by probability vector β states. The
state representation on a Bloch sphere is as shown in figure 1

Fig. 1 Visualization of a qubit through a Bloch Sphere representation [10]

The output of the quantum circuits are measurements done on the qubits. On
measurement, the state of qubit is said to collapse onto the measured state on the
measurement basis. The qubits states are found to be noisy and can lead to errors
in the measurement outcomes. To overcome this quantum algorithms are run for
finite times to infer the result of the measurement of probability. The finite number
of times execution is called shots. The greater the number of shots we get the more
probable outcome of the state from the quantum circuits.

2.2 Quantum Machine learning


Quantum machine learning is a process of deriving an algorithm to learn from the
data involving the quantum principle. The output of the algorithm ends up giving

4
out a set of a quantum state. These state measurements given out in the form of
classical data are given for training. The training can be for a discriminating task or
a regression task or a generative task. Quantum computers are advantageous but do
have certain limitations. The limitations are subjected to noise in the qubits, and also
the availability of the number of qubits. The more the number of qubits the more
noise between them. Quantum computers of present availability are said to be in their
infant state to operate on. They do not support well for the longer depth of quantum
circuits to operate upon. They are having highly coherent and in-coherent noises as
discussed by Shaib Ali et al., [11]. To handle this more qubits are required to do error
mitigation. But today’s available quantum computers are limited by the number of
computable qubits. The simulation of a quantum computer with an increase in the
number of qubits is a very difficult task on classical computers. Noisy Intermediate-
Scale Quantum (NISQ) technologies are proposed to work with the available quantum
computers with their limitations to be used at their fullest extent possible [12]. The
quantum and classical computers are connected in a loop-back fashion. The algorithms
are run on quantum circuits for a finite number of shots over shallow-depth quantum
circuits. The quantum circuit outputs are measured giving the classical outputs on a
probability basis derived from the number of shots. These classical data are given to
the classical optimizer to check for the optimal solutions. The loopback between the
classical and quantum processes is done with new parameters until optimal solutions
are derived.

Fig. 2 Variational Quantum Algorithm for hybrid quantum-classical algorithm for supervised learn-
ing, representing quantum and classical loop back connection [13]

The advantages of using quantum computers for learning is task needed or not, are
well discussed by Maria Schule et al., in [14]. Variational Quantum Algorithms (VQAs)
are used as classical optimizers to train parameterized quantum circuits to gain quan-
tum advantages. These VQAs are much found to be analogous to some of the highly
successful machine learning models according to Cerezo et al., [15]. Parameterized
quantum circuits are found to be highly expressive for machine learning applications.
Quantum machines are used only when classical machines fail to do the computational
task. The hybrid mode of classical-quantum machines is used in a loop-back to gain
the quantum advantages [16]. Parameterized quantum circuits are used for encoding
classical inputs into a quantum state of higher dimension. This higher dimensional is
Hilbert space which helps in quantum embedding for machine learning according to

5
Seth Lloyd et al.,[17]. The embedded quantum data are found to be similar to that of
kernel methods in machine learning models at higher dimensional spaces as stated by
Maria Schuld in [9]. At higher dimensional space they form a classifier model build is
termed to be a robust classifier according to Yunchao et al.,[18].
Several recent works on quantum machine learning algorithms by Biamonet et al., [3]
have shown that there is quantum gain in machine learning aspects in solving prob-
lems like solving system of linear equations [19], quantum support vector machine
using kernel model [6], quantum principle component analysis [20], quantum neural
networks [21], quantum for Bayes classifiers[22], quantum gradient descent for linear
systems and least squares [23]. Most of these quantum algorithms that work for these
machine learning models are based on famous algorithms like Shor’s Integer factor-
ization algorithm [24] and Grover’s database search algorithms [25] have become the
building blocks of quantum machine learning programs. There are hardware and soft-
ware challenges regarding the actual implementations, but the hope of solving complex
problems has become true. The gain achieved by QML over ML methods is well dis-
cussed in [3][26] . Quantum machine learning for pattern classification can outperform
classical machine learning methods as stated by Maria Schuld et al., in [27][28].

2.3 Classical Support vector machine


Support vector machines are supervised learning algorithms used for solving classi-
fication problems. Based on the data feature vectors and their relation between the
feature vectors the inference from the data can be classified into two subgroups with
a decision boundary hyperplane. The classification can be a binary class classification
or a multi-class classification. SVM helps in finding the fat margin boundaries that
classify data with positive and negative hyper-planes having support vectors on them.
The objective of support vector machines(SVM) is to find the maximum margin. In
formations of the fat margin, if the data is linearly separable the SVM forms a linear
classifier model. When the data is not linearly separable then the data is transformed
to a higher dimensional space and then the linear classification model is developed over
that space. This technique is called the kernel trick. Different kernels are formulated
classically to do this task [5]. We need to find a boundary out of many possible bound-
aries that maximize margins between the closest data points to classify the data. We
need to find the maximizing margin between the closest data points to the boundary.
Let the classical support vector algorithm take a set of training data samples set of
n points S = {(x1 , y1 ), ...(xn , yn )} where xi ∈ Rd , yi ∈ {−1, 1}. In this discussion, we
are confining to binary classification only. But multi-class classifications can be done
based on multiple binary SVM classifications over combinations according to [30].
Assume that the training set has polynomial size, n = poly(d) in a time poly(d, n),
returns a set of optimal parameters (w, b) ∈ Rd × R which define a linear classifier
f ∗ : χ → {−1, 1} as follows. When the data is linearly separable in higher dimensional
space H, given some feature maps φ : χ → H, then there exist parallel supporting
hyperplanes that dived the training data given be ⟨w, φ(.) + b⟩ = y ∈ Y . The goal
is to find the fat margins which are to be maximized between the margins, this is a
hard margin classifier. To even more maximize the margins, there needs a relaxation
in the linearly separable condition by adding a penalty term ξi , which is called a soft

6
Fig. 3 Hyperplane separation between the data points along with positive and negative margins
parallel to the separating hyperplane to maximize the margin. The samples on the margins are called
the support vectors [29]

margin classifier. The margin classifier is as shown in figure 3 The ξi is called the slack
variable to represent the violations of the data in the linearly separable condition. As
2
the margin is given by ||w|| by simple geometry the mathematical formulation of the
support vector machine can be given by primal function formulation as

M −1
1 C X 2
P ∗ = min ||w||2 + xi (1)
w,b,ξ 2 2 i=0

such that yi (⟨w, φ(xi )⟩ + b) ≥ 1 − ξi , The above equation 1 strongly holds good as a
quadratic optimization problem in the optimization and we can take the dual of the
function as the Lagrangian formulation as
n
1 →
L(→

w , b, →
− ||− αi (1 − yi (→

w .→

X
α) = w ||2 + x i − b)) (2)
2 i−1

where Primal is: P ∗ = min




maxL(→ −
w , b, →

α)
w ,b α≥0

Dual is : d = max min →
− →

L( w , b, α ), given by


α≥0 w ,b
n
X 1X
d∗ = max αi − αi αj yi yj K(xi .xj ) (3)
α≥0
i=1
2 i,j
n
X
subjected to αi yi = 0, and
i=1
M M
1 X
L(→

X
α) = yj α j − αj Kjk αk (4)
j=1
2
j,k=1

Dual maximizes over the KKT multiplier →



α = (α1 , ..., αM )T are subjected to :
M
X
αj = 0, yi αj ≥ 0
j=1

7
The hyperplane parameters are recovered and only a few of the αj ’s are non-zero enti-
M
ties concerning vectors that line on the two optimal hyperplanes with →
− αj →

X
w∗ = xj
j=1
and b∗ = yi − → −w .→
−x j , note: Kj,k = K(→ −x j, →

x k) = →

x j .→

x k is the Kernel matrix or
K(xj , xk ) = ⟨φ(xj )φ(xk )⟩∀xj,k ∈ χ
On the kernel matrix evaluation, we find the output classification for the unseen
test data as
ypred = y(→
−x ) = f ∗ (x) = sign(⟨w, x⟩ + b) (5)
in the dual form, we can estimate the unseen data with a binary classifier using the
following equation 6

M
!
y(→
− yj αj K(→

xj , −
→) + b
X
x ) = sign x k (6)
j=1

P
where ⟨w, x⟩ = i xi wi . for every data vector xi with true label yi , it is easy to see
that the classifier is correct on this point if and only if y(⟨w, x⟩ + b) > 0. We say a
training set S is linearly separable if there exists (w, b) such that

yi (⟨w, xi ⟩ + b) > 0, f or every (xi , yi ) ∈ S (7)


and such a ⟨w, b⟩ is called separating hyperplane parameter for S in Rd where w is
the normal vector to the classifier hyperplane, and b is the bias parameter of the plane.
2
The margin is given by two parallel hyperplanes separated by a distance of ||→ −
w ||
with
no data points inside the margins. The equation 2 shows the lagrangian for SVM in
dual form for which primal is P ∗ = min →

maxL(→ −
w , b, →

α ) Now when if the data is not
w ,b α≥0
at all linearly separable at the d -dimensional vector space then we need to transform
it into n-dimensional vector (n >> d) through a feature map:

φ : χ → Rn (8)

where we assume the φ is normalized or it maps a unit vector to a unit vector. The
feature map is chosen before seeing the training data. In the dual formulation, training
data is accessed only via the kernel matrix K ∈ Rn×n , where n is the number of
training samples. The kernel matrix is a positive semi-definite defined as

K(xi , xj ) = ⟨φ(xi ), φ(xj )⟩ (9)

Through this feature mapping, we can work with an exponentially large or higher
dimensional space as long as the kernel is computable in time poly(d). Also for any
feature map φ0 : χ →√Rn , we can always use a new feature map φ : χ → Rn+1 such
that φ(x) = (φ(x), 1)/ 2, which allows us to remove bias term b. This can be done by
changing the kernel as K(xi , xj ) = 21 (K0 (xi , xj ) + 1). without loss of generality, we
can run a suitable kernel, following the dual of the optimization function.
These kernels are said to provide the similarity between the objects on classifica-
tion at a higher dimension space, mathematically a kernel is defined as K(xi , xj ) =

8
⟨φ(xi ), φ(xj )⟩ where K is the kernel function of xi , xj are n-dimensional inputs. φ is
a mapping from d-dimension to n-dimension on space, ⟨xi , xj ⟩ denotes the dot prod-
uct, usually ‘n’ is much larger than ‘d’. Normally when calculating the dot product
or inner product ⟨xi , xj ⟩ required us to calculate φ(xi ), φ(xj ) first and then do the
dot product. These computation steps can be quite expensive as they involve manip-
ulations in ‘n’ dimensional space, where ‘n’ can be a large number. Mapping our
features into higher dimensional space is expensive. With the kernel trick, we should
be able to compute the inner product in the higher-dimensional space without visiting
the higher-dimensional or even without knowing what the higher-dimensional space
is. In machine learning, different kernels are developed to address this problem to
map low-dimensional space vectors to higher-dimensional space through kernel trick.
The popular kernels are Polynomial Kernel, Gaussian Kernel, Radial Basis Function
(RBF), Laplace RBF Kernel, Sigmoid Kernel, etc,[7].
The requirements for computing kernel function are expected to scale exponentially
with the size and number of features. These computations are termed hard problems
in classical computing. In the Kernel construction, the function queries M (M − 1)/2
time with a complexity of O(N ), and the quadratic programming takes the complexity
of O(M 3 log(1/δ)) to find the α∗ for a non-sparse kernel matrix. The modern pro-
gramming method tries to reduce the order of complexities but still, it is higher up
to O(M 2 N ) in regards to kernel generation and quadratic programming. As stated
the feature mapping to higher dimensions classically is not feasible always. Feasible
only when kernel matrix is largely sparse and only support vector point leads to the
computation in kernel evaluation and remaining terms will become zero. Which is not
the case always. When more data vectors are contributing to the kernel matrix, clas-
sical machines fail to compute. To minimize the order of complexity to O(M N ) times
quantum algorithms can be used [31]. They can be used in hybrid classical-quantum
loop-back architectures on NISQ devices. According to [32] quantum advantage can
be gained in these computations only when the kernel cannot be estimated classically.

2.4 Quantum support vector machine


The quantum computers are found to be efficient in solving classically hard problems.
According to Rebentrost et al., [6] the quantum support vector machine can gain
the exponential speedup in evaluating inner products. This allows quantum kernel
machines to perform the kernel evaluation directly in the higher-dimensional space.
All this is possible only if the data is provided in a coherent superposition for the
quantum computers. The superposition principle is very helpful in the evaluation of
problems for classification problems. This is done with a higher degree of precision,
recall, and F-measure when compared to classical techniques of decision-making in
classification problems [33]. However, if the data provided is of form in classical for-
mats, the method stated by Rebentrost et al., in [6] cannot be applied for classification
directly. This is well stated by Havlı́ček, Vojtě et al., [32] because we require qRAM.
At the present state of quantum technology, we do not have qRAM functionally fault-
tolerant and error-corrected. The QSVM using Least-Squares SVM is found to be well
working on quantum hardware when qRAM is available. The problem with the least-
squares SVM model leads to overfitting according to Ding et al., [34]. The QSVM is

9
found to be beneficial in classification problems when training vectors are dense and
polynomial kernels are required with classical machines according to Zhaokai Liu et
al., [35]. The present available quantum computers with noise in the system are found
to be used to the fullest extent using hybrid classical-quantum loop-back architectures
on NISQ devices. The quantum advantage is said to be gained only when the kernels
cannot be estimated classically [32]. The quantum support vector machines gain an
exponential factor in evaluating inner products allowing quantum kernel evaluations
directly in higher-dimensional spaces. This is possible when data is provided in quan-
tum superposition states or embedding classical data into quantum data. Classical to
quantum data is done through encoding, which will transform it into higher dimen-
sional spaces. This is by the quantum principle itself. Quantum feature map provides
advantages with different encoding strategies using feature maps like Pauli feature
maps, Z-Feature maps, ZZ-Feature maps, etc in encoding, with the entanglement of
data into qubit interactions. The map generated through this feature maps of encod-
ing data uses the property of superposition and entanglements through Hadamard
and control rotational gates operations concerning with regards to the parameters of
encoding. Figure 4 shows the encoding of the feature maps. We understand that the
crucial interaction between the encoded quantum gates is essentially the interactions
between the features in the data sets at high dimensional and complex correlations
among the data. This is the key quantum advantage we gain in quantum support
vector machines in transforming to higher dimension space into the building of the
kernel matrix functions. The measurable quantities of the qubits can be understood as
interacting qubits of the entangled data are the interaction between the data. These
quantum kernel functions are measured suitably to classical format and then given
back to classical machines for support vector classification to become easy for them to
compute as higher-level interactions in the data are already mapped in the quantum
space. The learning of these interactions in building a classifier at higher dimension
space is a key advantage of having quantum computers in use with the variation
quantum circuits operation on hybrid quantum-classical approaches according to Mari
Schuld et al., [9].
It is an inference that when data is a quantum feature map and quantum unitary
classification is run on the quantum machine, the classification is much more efficient
over the complex data and improved accuracy can be gained according to Yunchao
Liu et al., [18]. These quantum kernels are said to provide better accuracy and show
quantum speed up in the classification. The classical data vectors are mapped into
quantum states.
x → |φ(x)⟩⟨φ(x)|
, the global phase is avoided by density matrix representation. The Kernel function is
the inner product between density matrices,

K(xi , xj ) = |⟨φ(xi )|φ(xj )⟩|2 (10)

Quantum feature map is implemented via a quantum circuit parametrized by x

|φ(x)⟩ = U (x)|0n ⟩ (11)

10
The feature map used n qubits, and then the kernel function is

K(xi , xj ) = |⟨0n |U † (xi )U (xj )|0n ⟩|2 (12)

the quantum circuit is run inputs |0n ⟩ with U † (xi )U (xj ) and measure the output
probability over |0n ⟩ outputs. However, when we run the parametrization of unitary
operation over the data with the number of rotations we get good results of accuracies
as the data will not be biased, and good classification can be expected. However, due to
the coherent and incoherent noise in the quantum circuits, we can only have a limited
number of rotations of unitary transformation which is again dependent on the type of
data which is given to the quantum oracle circuit. In quantum machines, the system is
fully random and unbiased towards the data. When a quantum kernel matrix is built
using a quantum computer, they are having a high degree of expressibility in building
up as a good classifier. This kind of robustness in classification and quickness can be
expected using quantum computing [36] [8]. Like different classical kernels can be used
to find a different kind of separations in the data, different quantum feature maps with
different number types of rotations and repetitions could generate a different kind of
separation of data in a robust manner [18]. How this advantage is gained is discussed
subsequently.
Data Handling in QSVM
The Steps in the quantum support vector machine learning process are
• In preparation for the initial state of the system, all the qubits are initialized to |0⟩
• Encoding the classical data into the prepared qubits, a set of unitary transforma-
tions is performed on the input register. These unitary transformations are a set of
Hadamard operations, followed by a set of rotation gate operations like Rx , Ry , Rz
called Pauli rotation operations and some set of controlled operations regarding the
quantum machine learning algorithm being implemented.
• Running the parameterized quantum circuit according to the QML algorithm of
that particular ML model with some repetitions and entangled feature maps, then
finally
• Taking the measurement output and converting it to the respect classical data value.
Then give feedback to the classifier optimizer. This loop-back between quantum and
classical is done until optimization is achieved concerning the classier.

Fig. 4 Encoding classical data into quantum entangled feature map using one and two-qubit gates
with a 4 qubit operation [37]

11
From the figure 4, we see that a maximal superposition state is prepared with the
help of Hadamard gates and then it is passed to the n-qubit unitary parameterized
quantum circuit, then PQA is run in the unitary will create feature space transforma-
tion on the data. The data to the quantum circuit is not passed as individual qubits
state space, they are entangled and superposition state space is used instead in the
form of parameters of unitary gates U φ(x) and U φ(y), representing a high dimensional
feature map.
The near-term quantum machines algorithms run on a hybrid classical-quantum
approach, the quantum outputs are measured to classical data. As discussed formerly
in step 2.4 the circuit is run for a finite number of times until we get a stable optimized
output. Hadamard transformations between the quantum unitary operations are done
for more randomness and uniformity in the actions of unitary operations between the
data is achieved on the qubits. This makes the quantum classifier to be more robust in
building a classifier model according to [18]. The Hadamard transformation helps the
quantum classifiers to find out the patterns in a more quantum mechanical approach
by the principle of superposition’s and interference’s properties.

3 Step-by-step working process of SVM and QSVM


The following flow of steps gives us a brief understanding of how SVM works. The
data pre-processing is common for both classical and quantum methods. The SVM
workflow and QSVM workflows are shown in two different diagrams.

The following figure 5 illustrates the steps shown in the flow of the Classical support
vector machine.
1. Step 1: Input the data set to the algorithm with N-dimension data samples

2. Step 2: Pre-Process the Data: Refinement of data, Normalisation of data

3. Step 3: Feature Extraction and Dimensionality reduction using PCA algorithm

4. Step 4: Splitting of data into Training 80% and Testing 20% subsets
Step 5: Training, Using M training Data Samples,(→ −x i , yi ) : →

x i ∈ RN , yi = ±1


where yi = +1or − 1. Let w represent the perpendicular distance from the decision
boundary. The margin is given by two parallel hyperplanes separated by distance
2
||→

w 2 ||
with no data points inside the margin. The objective constraint function is
discussed in the section 2.3

5. Step 6: Optimization In optimization we take dual of the SVM. The Lagrangian


formulation for the primal function is done and we compute the kernel function
over the training data as discussed in section 2.3

12
6. Step 7: Testing On the kernel matrix evaluation we find the output ! classification
M
for the unseen testing data y(→
− yj αj k(→

x j, →

X
x ) = sign x k ) + b , the result is
j=1
a binary classifier for a new vector →

x , which returns −1 or +1

7. Step 8: Evaluation Compare predicted y(→ − ′


x i ) against actual y(x)i if the classifica-
tion is correct for all the testing data xj , and compute the accuracy scores

8. Step 9: Check for whether the desired accuracy is met or not. If Yes stop, else go
to Step 10.

9. Step 10: Go to Step 5 and reiterate the process up to Step 9 or Opt to go for
Quantum SVM if still not meeting the requirement

Fig. 5 Flow diagram for Classical SVM Classifier

The QSVM Steps are common up to Step 4, it is suggested that only when classical
SVM cannot classify with the desired accuracy after a finite number of iterations from
Step 5 through Step 8, the only Quantum method of using SVM be used. Quantum
computers are good at manipulating vectors and tensor products in high-dimensional
spaces. Computing the inner products between large vectors takes less time in quantum
than in the classical regime [38]. In the Quantum method of SVM, there are two
different methods which are proposed by Havlicek et al., [32]. In the first method,
classical data is encoded into a quantum state and then a suitable feature map is used
as the quantum oracle parametrized operations are performed a finite number of times,

13
and finally, measurement is done to find the classification of the input vector through a
method like a post parity processing method. This method is found to be less accurate
than the other method stated on large data sets. The other method is to encode the
classical data into a quantum feature map and then compute the pairwise product
and find the inner product through the swap test method to find the kernel matrix.
Then this kernel matrix is given to classical SVM for the classification purpose to
classify the data. This second method using with classical-quantum machine interface
is found to be better than the former. For this purpose, the state vector simulator
of the quantum machine was also able to generate the kernel matrix inner products
which are used in our experiments [39].
The following figure 6 shows the illustration of the Steps shown in the flow chart
of the Quantum Support Vector Machine
1. Step 1: State preparation of input quantum circuit
The number of qubits required is initialized to state |0⟩ and then the input state
vector which is pre-processed data is given as the input

2. Step 2: Upload classical data or Encoding


Encode the classical data in a Quantum Device with a Quantum circuit through the
feature maps generated. An example of a Z-feature map for the encoding process
itself is given. The feature map for a single qubit is
 

φs (→

X
UΦ(→
x ) = exp i
−  x )Πj∈s Zj 
S⊆[n]

Where Zj is the feature map, for first-order single qubit with no entanglement
S ∈ {0, 1, ..., n − 1} , φi (x) = xi . For the second order with two or more
qubits with entanglement S ∈ {0, 1, ..., n − 1, (0, 1), (1, 2), ..., (n − 2, n − 1)},
φi,j = (φ − xi )(φ − xj ), then the corresponding feature map with entanglement is
generated with ZZ rotation.

3. Step 3: Optimal classifier process


Parameterized the encoded qubit with the rotational gates using the various
feature maps like ZZ-Feature Map, or Pauli Feature map with a finite number of
repetitions of the parametric Oracle, then choose any one method for finding the
optimal parameters of classification and run the classification algorithm.

4. Step 4a: VQC-based QSVM classifier process


The encoded circuit qubits are given to variational circuit ansatz for a finite number
of repetitions of VQC and then finally measurements are taken to find the output
classifications. This number of repetitions or parametrization is optimized with the
training data and optimal θ values and number of repetitions are found in training.
These parameters are used to find the target data classification in the testing

14
5. Step 4b: Quantum kernel estimation classifier process
Take the pairwise data points and then find the circuit that computes the pairwise
inner product to produce the quantum Kernel Matrix using the dot product. This
Kernel matrix is given to efficient Classical SVM for the classification process. This
process is run over a finite number of times with the training data and finally, the
optimal kernel matrix is an estimate used for the classification of the target test
data class.

6. Step 5a: Using the Kernel estimate K(→



x j, →

x k ) of a quantum kernel in the measured
output of the Quantum Circuit in

M
!
y(→
− αj K(→

x j, →

X
x ) = sign x k) + b ,
j=1

the result of a binary classifier for a new vector target test x can be found, which
returns −1 or +1

7. Step 5b: Testing with the VQC circuit’s optimal parameters output to find the
target test data classification using the rule
!
1 X
y(→

x ) = sign wα (θ)Φα (→

x)+b
2n α

which returns −1 or +1

Fig. 6 Flow diagram for QSVM Classifier

15
4 Implementation of QSVM on IBM-Quantum
machine using Qiskit
The implementation of a quantum support vector machine is done using the IBM-
Quantum state vector simulator. To understand the effect of feature maps on the
support vector machine kernel building on different sets with different sizes of data
and features are used. The data sets are pre-processed using the techniques of data
normalization. To decrease the number of features is done by using principle compo-
nent analysis (PCA). Each feature vector is fed to a qubit. The number of features is
equal to the number of qubits. The training and testing data after dimensional reduc-
tion are selected to build a support vector machine model using QSVM kernel-based
approaches. This is one of the methods discussed for the implementation of a quan-
tum support vector machine by Havlicek et al., [32]. From the prepared data after
the pre-processing 80% is used for training and 20 % is used for testing purposes. We
use the feature maps generated to encode the classical data into quantum states to
build the quantum support vector machine kernels. Measurements of these quantum
kernels we get the high dimensional evaluated kernels. These are given to the classical
support vector machine for classification. IBM-Quantum Qiskit framework is used for
this purpose. The data sets from scikit-learn are used for this study. The data sets
with different sizes of data are tested over for the combinations of different types of
quantum feature maps generated through qiskit feature map generation application
program interfaces (API). Feature maps with different combinations with entangle-
ment and without entanglement and transpile of quantum circuits are done.
The following are the step-by-step processes to build a quantum support vector
machine using IBM-Quantum using the Qiskit framework. Python3 is used for this
purpose on Qiskit. The simulations are run to find the best fit Pauli feature map com-
bination to get better accuracy for a given data set. The best fit Pauli feature map
selection is not unique for all the data sets. It is observed that for different data sets
different feature maps are good.
1. Step 1: Load the required Python library packages from sklearn, qiskit, numpy, and
matplotlib.
2. Step 2: Load the data set from scikit-learn.
3. Step 3: Split the data into training and test data sets.
4. Step 4: Reduce the number of features required for building the model and apply
the normalization techniques to the data.
5. Step 5: Generate the feature maps using the Qiskit API function with varied depths,
repetitions, and entanglement features, and then bind the data to the generated
feature maps.
6. Step 6: Generate the quantum kernel using the quantum state-vector simulator as
the back-end quantum simulator.
7. Step 7: Quantum kernels are evaluated from the kernel matrix generated by the
back-end simulators using the training data sets.
8. Step 8: The kernel matrix generated by the previous Step is given to the classical
support vector machine kernel and evaluated for both test data in deriving the
scores of accuracy attained by the model. A comparison of the kernel evaluations

16
using the different classical kernels over the same training and test is used to get
the scores of evaluations for comparison against the quantum kernel evaluations
to different classical kernels evaluation like linear kernel, polynomial kernel, radial
basis function kernel, sigmoid kernel.
The detailed process of running with code is explained with tutorials provided by
Qiskit Global Summer School 2021. https://github.com/Qiskit/platypus/blob/main/
notebooks/summer-school/2021/lab3.ipynb

5 Analysis
We have performed experimentation to find the accuracy scores for classifications using
classical SVMs compared against the QSVM kernels. To build QSVM kernels various
feature maps are generated with the Pauli rotational unitary quantum gates. Differ-
ent types of feature maps are generated. The combination of the X-Pauli rotational
gate with a combination of X-Pauli, Y-Pauli, and Z-Pauli gate with two gate rota-
tions combinations, three gate rotation combinations are tried. We have considered 18
such combinations. We also have taken up ZZ Featuremap generation given by qiskit
API. ZZ Featuremap is generated with three different types of entanglements: linear,
circular, and full. In the classical kernels, we have taken up kernels like linear kernel,
polynomial kernel, RBF kernel, and sigmoid kernel. To understand the effect of the
size of the data set, the algorithm is run with varied sizes of data with 100, 200, 300,
400, and 500 data samples. We have done the experiments with data sets like Breast
Cancer data set, MNIST handwritten digits data set, IRIS data set, and Wine data
sets. All these are free data sets that are imported from the SK Learn library for study
purposes. We have used the Qiskit tool library for the generation of feature maps. We
have used the IBM Quantum state vector simulator for the generation of the quantum
kernel matrix.
For the current discussion, we consider the Breast cancer data set and MNIST hand
written digits data set. The breast cancer data set has a total of 569 data samples
with 30 different attributes about each data sample. From these attributes’ data, we
should be able to classify the data sample to be either ’benign’ or ’malignant’. The
data samples feature are reduced to 5 features from 30 using principle component
analysis (PCA). This data is pre-processed using the normalization techniques using
the API functions given by classical computing methods. To understand the effect of
the size of data sets, with different sizes the algorithms are run. The data from the
samples is picked up at random through the techniques given by SK Learn functions.
This final pre-processed data set is given to classical kernel methods for classifica-
tion. The classification accuracy scores for the four classical kernels are taken for each
sized data sample separately. The test data sample from the chosen sample size of
data are used for evaluation purposes. The same data is quantum encoded through
parameterized quantum circuits and then quantum kernel estimates are derived. This
quantum-estimated kernel is given to the classical kernel method to find the accuracy
score of classifications. Accuracy scores for all the different combinations of Pauli fea-
ture maps and ZZ feature maps are recorded and a plot of this is shown in the figure
7(a) with 100 samples. Similarly, data size with 200 data samples is shown in 7(b), 300

17
data samples in 7(c), 400 data samples in 7(d), and 500 data samples in 7(e) for Breast
cancer data sets. In a similar, manner to breast cancer data set analysis, MNIST hand-
written data sets with image data in the form of 28*28 grayscale intensities of images
with the first column labeled (0 to 9) for every image are considered for classifications.
There are 10,000 images with the labels 0 to 9 are there and data selected for our pur-
pose is selected at random from these in our required sizes. The accuracy scores for
all the different combinations of Pauli feature maps and ZZ feature maps are recorded
and a plot of this is shown in the figure 8(a) with 100 samples. Similarly, data size
with 200 data samples is shown in 8(b), 300 data samples in 8(c), 400 data samples
in 8(d), and 500 data samples in 8(e) for MNIST handwritten digits data sets. In the
generation of quantum kernel estimates, the feature maps with repetitions of feature
maps are done for 1, 2, 3, 4, and 5 times the feature maps. This repetition of feature
maps is mostly done to have more randomness and error correction in the quantum
approaches as suggested by Abhinav Kandal et al., [40]. The classical accuracy score
with various classical kernels is shown in the table 1 for Breast cancer data set and in
table 2 for MNIST Handwritten digits data set.

Table 1 The Classification test scores accuracies derived from the best known Classical
kernels for Breast Cancer data set

Number of Data Samples Linear Kernel Poly Kernel RBF Kernel Sigmoid Kernel
100 0.96 0.96 0.96 0.90
200 0.94 0.90 0.94 0.85
300 0.94 0.92 0.94 0.90
400 0.91 0.92 0.92 0.77
500 0.92 0.92 0.92 0.80

Table 2 The Classification test scores accuracies derived from the best known Classical
kernels for MNIST Handwritten digits data set

Number of Data Samples Linear Kernel Poly Kernel RBF Kernel Sigmoid Kernel
100 0.75 0.68 0.84 0.83
200 0.92 0.88 0.91 0.92
300 0.91 0.88 0.92 0.88
400 0.93 0.92 0.90 0.85
500 0.92 0.92 0.90 0.90

The observations we could understand from the outputs from the Breast cancer
data sets are as follows. The combination of feature maps with their type say X-X, Y-Y,
or Z-Z without entanglement has significantly reduced their classification accuracies.
In the ZZ feature map which has entanglement between the qubits. The case with
linear entanglements had a higher accuracy score than circular entanglements and then
fully entanglements cases. The same trend is observed with all the sized data sets. The
data sizes have very negligible effect or no effect on the execution of quantum kernels.
Almost all the sized data sets showed the same trend in their classification accuracy
scores. But when we see the classical kernel accuracy scores, with small data size sets

18
(a)

(b)

(c)

19
(d)

(e)

Fig. 7 Breast Cancer Data set with 100, 200, 300, 400, 500 Data Samples in 7(a),7(b),7(c), 7(d),7(e)
respectively. The Accuracy scores with the quantum kernel are on y-axis and different feature maps
in the x-axis
(a)

(b)

(c)

20
(d)

(e)

Fig. 8 MNIST Hand written digits data set with 100, 200, 300, 400, 500 Data Samples in
8(a),8(b),8(c), 8(d),8(e) respectively. The Accuracy scores with the quantum kernel are on y-axis and
different feature maps in the x-axis
their accuracy scores the very much better than at higher volumes of data. We can
infer that quantum kernels can handle larger volumes of data with the same amount
of accuracy score.
When we run the algorithm with only one repetition of feature maps we have
observed that the accuracy scores are less in most of the feature maps. When we try to
increase the repetitions of feature maps there the effect of repetition of feature maps
is different with different types of feature maps. The reason which can be for this is
due to coherent and incoherent noise in the quantum circuits. Coherent noise can be
caused by mis-calibrations due to over-rotation of qubits concerning parameterized
quantum circuits in quantum kernels. This is also observed by Kristan Temme et al.,
in [41]. Coherent noise is attributed to shallow circuits. When we do more repetitions
of feature maps, the depth of circuits will tend to increase and there is also a reduction
of accuracy scores observed due to a de-coherent nose. The reason for this de-coherent
noise is that the quantum circuits are to be measured within the coherence time limit.
If the measurement time of the out-of-deeper circuits extends the coherence time
limit of the quantum circuit, the noise will lead to inaccuracy of the state’s measure.
Some combinations of feature maps at certain shallow depths are more immune to
noise and some are more immune at larger depths of circuits. But shallow is the
circuit more accuracy scores are observed from the graphs. The trends observed with
Breast cancer data sets cannot be generalized to all the types of data sets. With
MNIST handwritten digits data set a different trend is been observed. With one feature
map repetition, we have observed greater accuracy and with an increase in several
repetitions of feature maps, we have observed that the accuracy score decreased. This
trend is observed with all the feature maps and all the sizes as of our considerations. So
we cannot generalize the criteria of numbers of repetition of feature map interpretation
to all the types of data set. We need to apply this criterion of repetition of feature
maps based on the data sets only. The consideration of the same type of rotational
gate transformation like X-X, Y-Y, or Z-Z has been shown to have a reduction in
accuracy scores when compared to other combinations of X, Y, and Z rotational Pauli
gate transformations. The interpretations for this kind of dependency on data can
be considered about the classical to quantum data encoding through parameterized
quantum circuits. Based on the parameters of encoding on the quantum gates, they
are respondent to the noise characters of the quantum circuits according to Cincio et
al in [42]. According to Havlicek et al., in their finds [32], they have given us the basic
idea of these using feature maps with ZZ Feature maps, and our interpretations of
mathematics regarding this are shown in section 3. The mathematical interpretations
for other features are given in qiskit textbook documentation https://github.com/
qiskit-community/qiskit-textbook.
In all these analyses and usage of feature maps for quantum kernel methods using
the quantum gate, we have not done optimization of the quantum kernel methods.
There are different types of quantum computers according to Nielsen & Chuang [1].
We have used an IBM quantum gate-based quantum computer state vector simulator
which is very efficient at quantum gate-based for quantum kernel estimation opera-
tions but is not so efficient for optimization purposes. The annealing-based quantum
computers designed by D-Wave are very good for optimization problems to solve. The

21
optimization loopback is not considered for our study. The annealing-based based
are good at optimization problems but are not good at kernel-based estimations. To
gain the fullest extent of quantum computers we can use both these two types. The
annealing-based optimization on D-Wave should be given the output of Quantum ker-
nel from IBM quantum computer and these two are to run in a loop-back fashion
until optimization is attained. There are some practical difficulties in bridging these
two types of quantum computers. Since the mode of operations at present on these
two types of quantum approaches are very much different. There is a future scope of
research on how can these be bridged together for a better gain of quantum process.

6 Conclusion
In machine learning tasks, the classification problem is the most common one, for
which there are various models used. One of the powerful learning models is a sup-
port vector machine classifier with the kernel methods. This is mostly used for both
separable and inseparable data. But still, there are certain data for which these kernel
methods are not computable for better accuracy over the classical machines. In such
cases, quantum kernel methods based on quantum support vector machine classifiers
become robust classifiers owing to the quantum principles. These principles play a
vital role in the execution of quantum machines. In this paper, the insights into how
these quantum principles were useful in becoming a robust quantum classifier are
discussed. The comparison of classical and quantum kernel methods is well discussed
to understand the advantages of quantum machines over classical machines. Learning
using the quantum approach is found to be very good when compared to the classical
approach when we have limitations overcome by present-day quantum computers.
When can we use these quantum machines is discussed. The quantum computers that
are currently available cannot be used to the fullest extent. It is still unclear, what
set of problems can be solved by available quantum computers in the place of classi-
cal supercomputers. But when we have an intractable problem by classical machine
to solve or classify the data, then we can use near-term quantum machines. The
approaches of classical-quantum hybrid allow us to solve these problems by building
a better classifier than classical classifier algorithms. This paper gives us an insightful
discussion on the comparison of classical support vector machines and how quan-
tum support vector machines can be used on today’s available quantum computers.
Though there are several other proposed algorithms available for quantum support
vector machines the practical application with the quantum base kernel method is
well found to be useful for these near-term quantum machines. This also helps to
understand which feature maps are useful in deriving a better kernel estimator for
Quantum support vector machines.

AUTHOR CONTRIBUTIONS
The authors confirm being the sole contributors to this work and have approved it
for publication.

22
CONFLICT OF INTEREST STATEMENT
The author declares that they have no competing interests.

DATA AVAILABILITY STATEMENT


The original contributions presented in the study are included in the article.

CONSENT TO PARTICIPATE
Not applicable.

CONSENT FOR PUBLICATION


Not applicable.

References
[1] Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information:
10th Anniversary Edition. Cambridge University Press, ??? (2010)

[2] Bishop, C.M.: Pattern Recognition and Machine Learning. springer, ??? (2006)

[3] Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.:
Quantum machine learning. Nature 549(7671), 195–202 (2017) https://doi.org/
10.1038/nature23474

[4] Wittek, P.: Quantum Machine Learning: What Quantum Computing Means to
Data Mining. Academic Press, ??? (2014)

[5] Burges, C.J.C.: A tutorial on support vector machines for pattern recognition.
Data Mining and Knowledge Discovery 2(2), 121–167 (1998) https://doi.org/10.
1023/A:1009715923555

[6] Rebentrost, P., Mohseni, M., Lloyd, S.: Quantum support vector machine for big
data classification. Physical Review Letters 113(13) (2014) https://doi.org/10.
1103/physrevlett.113.130503

[7] Schölkopf, B., Smola, A.J., Bach, F., et al.: Learning with Kernels: Support Vector
Machines, Regularization, Optimization, and Beyond. MIT press, ??? (2002)

[8] Jäger, J., Krems, R.V.: Universal expressiveness of variational quantum classifiers
and quantum kernels for support vector machines. Nature Communications 14(1),
576 (2023) https://doi.org/10.1038/s41467-023-36144-5

[9] Schuld, M., Killoran, N.: Quantum machine learning in feature hilbert spaces.
Physical Review Letters 122(4) (2019) https://doi.org/10.1103/physrevlett.122.
040504

[10] Beckers, A., Tajalli, A., Sallese, J.-M.: A review on quantum computing: Qubits,

23
cryogenic electronics and cryogenic mosfet physics (2019)

[11] Shaib, A., Naim, M.H., Fouda, M.E., Kanj, R., Kurdahi, F.: Efficient noise mit-
igation technique for quantum computing. Scientific Reports 13(1), 3912 (2023)
https://doi.org/10.1038/s41598-023-30510-5

[12] Preskill, J.: Quantum computing in the nisq era and beyond. Quantum 2, 79
(2018) https://doi.org/10.22331/q-2018-08-06-79

[13] Macaluso, A., Clissa, L., Lodi, S., Sartori, C.: A variational algorithm for quantum
neural networks. In: Krzhizhanovskaya, V.V., Závodszky, G., Lees, M.H., Don-
garra, J.J., Sloot, P.M.A., Brissos, S., Teixeira, J. (eds.) Computational Science
– ICCS 2020, pp. 591–604. Springer, Cham (2020)

[14] Schuld, M., Killoran, N.: Is quantum advantage the right goal for quantum
machine learning? PRX Quantum 3, 030101 (2022) https://doi.org/10.1103/
PRXQuantum.3.030101

[15] Cerezo, M., Arrasmith, A., Babbush, R., Benjamin, S.C., Endo, S., Fujii,
K., McClean, J.R., Mitarai, K., Yuan, X., Cincio, L., Coles, P.J.: Variational
Quantum Algorithms (2020)

[16] Benedetti, M., Lloyd, E., Sack, S., Fiorentini, M.: Parameterized quantum cir-
cuits as machine learning models. Quantum Science and Technology 4(4), 043001
(2019) https://doi.org/10.1088/2058-9565/ab4eb5

[17] Lloyd, S., Schuld, M., Ijaz, A., Izaac, J., Killoran, N.: Quantum embeddings for
machine learning. arXiv preprint arXiv:2001.03622 (2020)

[18] Liu, Y., Arunachalam, S., Temme, K.: A rigorous and robust quantum speed-up
in supervised machine learning. Nature Physics, 1745–2481 (2021) https://doi.
org/10.1038/s41567-021-01287-z

[19] Harrow, A.W., Hassidim, A., Lloyd, S.: Quantum algorithm for linear systems
of equations. Physical Review Letters 103(15) (2009) https://doi.org/10.1103/
physrevlett.103.150502

[20] Lloyd, S., Mohseni, M., Rebentrost, P.: Quantum principal component analysis.
Nature Physics 10(9), 631–633 (2014) https://doi.org/10.1038/nphys3029

[21] Abbas, A., Sutter, D., Zoufal, C., Lucchi, A., Figalli, A., Woerner, S.: The power
of quantum neural networks. Nature Computational Science 1(6), 403–409 (2021)
https://doi.org/10.1038/s43588-021-00084-1

[22] Shao, C.: Quantum speedup of bayes’ classifiers. Journal of Physics A: Mathemat-
ical and Theoretical 53(4), 045301 (2020) https://doi.org/10.1088/1751-8121/
ab5d77

24
[23] Kerenidis, I., Prakash, A.: Quantum gradient descent for linear systems and least
squares. Physical Review A 101(2) (2020) https://doi.org/10.1103/physreva.101.
022316

[24] Shor, P.W.: Polynomial-time algorithms for prime factorization and discrete log-
arithms on a quantum computer. SIAM Journal on Computing 26(5), 1484–1509
(1997)

[25] Lov K, Grover: A fast quantum mechancial algorithm for Database Search, 212–
219 (1996) https://doi.org/10.1145/237814.237866 arXiv:9605043 [quant-ph]

[26] Huang, H.-Y., Broughton, M., Cotler, J., Chen, S., Li, J., Mohseni,
M., Neven, H., Babbush, R., Kueng, R., Preskill, J., McClean,
J.R.: Quantum advantage in learning from experiments. Science
376(6598), 1182–1186 (2022) https://doi.org/10.1126/science.abn7293
https://www.science.org/doi/pdf/10.1126/science.abn7293

[27] Schuld, M., Sinayskiy, I., Petruccione, F.: An introduction to quantum machine
learning. Contemporary Physics 56(2), 172–185 (2014) https://doi.org/10.1080/
00107514.2014.964942

[28] Zeguendry, A., Jarir, Z., Quafafou, M.: Quantum machine learning: A review and
case studies. Entropy 25(2) (2023) https://doi.org/10.3390/e25020287

[29] Gholami, R., Fakhari, N.: Chapter 27 - support vector machine: Prin-
ciples, parameters, and applications. In: Samui, P., Sekhar, S., Balas,
V.E. (eds.) Handbook of Neural Computation, pp. 515–535. Academic
Press, ??? (2017). https://doi.org/10.1016/B978-0-12-811318-9.00027-2 .
https://www.sciencedirect.com/science/article/pii/B9780128113189000272

[30] Giuntini, R., Freytes, H., Park, D.K., Blank, C., Holik, F., Chow, K.L., Sergioli,
G.: Quantum state discrimination for supervised classification. arXiv preprint
arXiv:2104.00971 (2021)

[31] Park, S., Park, D.K., Rhee, J.-K.K.: Variational quantum approximate support
vector machine with inference transfer. Scientific Reports 13(1), 3288 (2023)
https://doi.org/10.1038/s41598-023-29495-y

[32] Havlı́ček, V., Córcoles, A.D., Temme, K., Harrow, A.W., Kandala, A.,
Chow, J.M., Gambetta, J.M.: Supervised learning with quantum-enhanced
feature spaces. Nature 567(7747), 209–212 (2019) https://doi.org/10.1038/
s41586-019-0980-2

[33] Tiwari, P., Melucci, M.: Towards a quantum-inspired framework for binary
classification. In: Proceedings of the 27th ACM International Conference on
Information and Knowledge Management, pp. 1815–1818 (2018)

25
[34] Ding, C., Bao, T.-Y., Huang, H.-L.: Quantum-inspired support vector machine.
IEEE Transactions on Neural Networks and Learning Systems 33(12), 7210–7222
(2022) https://doi.org/10.1109/TNNLS.2021.3084467

[35] Li, Z., Liu, X., Xu, N., Du, J.: Experimental realization of a quantum support
vector machine. Phys. Rev. Lett. 114, 140504 (2015) https://doi.org/10.1103/
PhysRevLett.114.140504

[36] LaRose, R., Coyle, B.: Robust data encodings for quantum classifiers. Phys. Rev.
A 102, 032420 (2020) https://doi.org/10.1103/PhysRevA.102.032420

[37] Christian, B., Rainer, B., Christian, K., Merima, M., Nico, P., Barry, R., Roland,
S., Eldar, S.: Quantum MachineLearning in the Context of IT Security, (2022)

[38] Schuld, M., Petruccione, F.: Quantum Models as Kernel Methods. Springer, Cham
(2021). https://doi.org/10.1007/978-3-030-83098-4 6 . https://doi.org/10.1007/
978-3-030-83098-4 6

[39] Kavitha, S.S., Kaulgud, N.: Quantum machine learning for support vector
machine classification. Evolutionary Intelligence (2022) https://doi.org/10.1007/
s12065-022-00756-5

[40] Kandala, A., Temme, K., Córcoles, A.D., Mezzacapo, A., Chow, J.M., Gam-
betta, J.M.: Error mitigation extends the computational reach of a noisy
quantum processor. Nature 567(7749), 491–495 (2019) https://doi.org/10.1038/
s41586-019-1040-7

[41] Temme, K., Bravyi, S., Gambetta, J.M.: Error mitigation for short-depth quan-
tum circuits. Phys. Rev. Lett. 119, 180509 (2017) https://doi.org/10.1103/
PhysRevLett.119.180509

[42] Cincio, L., Rudinger, K., Sarovar, M., Coles, P.J.: Machine learning of noise-
resilient quantum circuits. PRX Quantum 2, 010324 (2021) https://doi.org/10.
1103/PRXQuantum.2.010324

26

You might also like