0% found this document useful (0 votes)
23 views14 pages

Entropy 26 00905

Uploaded by

Danilo Fachin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

Entropy 26 00905

Uploaded by

Danilo Fachin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/385209334

Quantum Machine Learning-Quo Vadis?

Article in Entropy · October 2024


DOI: 10.3390/e26110905

CITATIONS READS

0 90

1 author:

Andreas Wichert
Technical University of Lisbon
127 PUBLICATIONS 1,171 CITATIONS

SEE PROFILE

All content following this page was uploaded by Andreas Wichert on 24 October 2024.

The user has requested enhancement of the downloaded file.


entropy
Article
Quantum Machine Learning—Quo Vadis?
Andreas Wichert

Department of Computer Science and Engineering, INESC-ID & Instituto Superior Técnico, University of Lisbon,
2744-016 Porto Salvo, Portugal; [email protected]

Abstract: The book Quantum Machine Learning: What Quantum Computing Means to Data Mining, by
Peter Wittek, made quantum machine learning popular to a wider audience. The promise of quantum
machine learning for big data is that it will lead to new applications due to the exponential speed-up
and the possibility of compressed data representation. However, can we really apply quantum
machine learning for real-world applications? What are the advantages of quantum machine learning
algorithms in addition to some proposed artificial problems? Is the promised exponential or quadratic
speed-up realistic, assuming that real quantum computers exist? Quantum machine learning is based
on statistical machine learning. We cannot port the classical algorithms directly into quantum
algorithms due to quantum physical constraints, like the input–output problem or the normalized
representation of vectors. Theoretical speed-ups of quantum machine learning are usually analyzed
in the literature by ignoring the input destruction problem, which is the main bottleneck for data
encoding. The dilemma results from the following question: should we ignore or marginalize those
constraints or not?

Keywords: quantum machine learning; basis encoding; amplitude encoding; input destruction
problem; HHL; quantum kernels; variational algorithm

1. Introduction
Deep learning has achieved tremendous successes; it is based on error minimizing
algorithms that approximate a distribution of a population. The approximation is based
on the error minimization of the prediction of a very large training sample, with the
Citation: Wichert, A. Quantum
assumption that the large sample describes the population sufficiently well. The error
Machine Learning—Quo Vadis?
minimization is based on a loss function and the back-propagation algorithm. The back-
Entropy 2024, 26, 905.
https://doi.org/10.3390/
propagation algorithm applied to deep learning architectures requires huge computational
e26110905
resources in hardware, energy, and time. The promise of quantum machine learning
is that it will overcome these problems due to quadratic or even exponential speed-up
Academic Editors: Osamu Hirota and in time and the possibility of compressed data representation. However, can we really
Giuliano Benenti
apply quantum machine learning for real-world applications? What are the algorithmic
Received: 13 September 2024 constraints of quantum machine learning? Currently there is a huge body of literature on
Revised: 14 October 2024 quantum machine learning, which we are unable to review. Instead we will deal with four
Accepted: 23 October 2024 fundamental categories of quantum machine learning including quantum encoding.
Published: 24 October 2024
• The process of transferring classical data into quantum states is an essential step in
quantum algorithms. This process is called quantum encoding or quantum state
preparation. Binary patterns representation of a training sample using quantum
Copyright: © 2024 by the author.
encoding and the application of the Grover’s algorithm is presented. It leads to
Licensee MDPI, Basel, Switzerland. quadratic speed-up of popular machine learning algorithms, like k-nearest neighbor,
This article is an open access article clustering, and associative memory.
distributed under the terms and • We describe the representation of a training sample by amplitude encoding. It is used
conditions of the Creative Commons in the quantum algorithm for linear systems of equations (HHL) based on Kitaev’s
Attribution (CC BY) license (https:// phase estimation algorithm leading to an exponential speed-up. Since almost all
creativecommons.org/licenses/by/ machine learning algorithms use some form of a linear system of equations, it is
4.0/). assumed that the HHL algorithm is going to be one of the most useful subroutines.

Entropy 2024, 26, 905. https://doi.org/10.3390/e26110905 https://www.mdpi.com/journal/entropy


Entropy 2024, 26, 905 2 of 13

• Quantum kernels that are not based on the kernel trick but map the vectors directly
into high-dimensional space and may lead to new kernel functions.
• Variational approaches are characterized using a classical optimization algorithm to
iteratively update a parameterized quantum trial solution that can open new insights
in real-world problems, like the study of the behavior of complex physical systems
that cannot be tackled with classical machine learning algorithms.

2. Binary Patterns and the Grover’s Algorithm


Basis encoding encodes a n dimensional binary vector to a n-qubit quantum basis state.
Ventura and Martinez [1,2] proposed a method to encode m binary linear independent
vectors with dimension n (with n > m) into a superposition of a n-qubit quantum state.
We describe the simplified method by [3]. The procedure successively divides a present
superposition into processing and memory branches. The input patterns are loaded into
new generated memory branches step by step. The cost of the method is linear in the
number of stored patterns and their dimension [4]. At the initial step, the system is in the
basis state with load qubits, memory qubits and the control qubits c1 , c2

|memory; c2 , c1 ; load⟩

(using little endian notation). The basis states are split step by step using the control qubits
c1 , c2 until the required superposition is present
m
1

m
∑ |memory; c2 , c1 ; load⟩ j
j =1

and with the memory register in the required superposition


!
m m
1 1

m
∑ |memory; 0, 0; 0 · · · 0⟩ j = √
m
∑ |memory⟩ j ⊗ |0, 0; 0 · · · 0⟩.
j =1 j =1

The processing branch is indicated by the control qubit c2 with the value one (c2 = 1) and
the memory branch representing the current superposition with the control qubit c2 with
the value zero (c2 = 0). The the qubit c2 is split by the operator CSp represented
 by the
parametrized U gate U (θ, ϕ, λ) = with ϕ = π, λ = π, and θ = arcsin √1 ·2
p

 
1 0 0 0
 0 1 q0 0
  
1  
CS p = CU (arcsin √ · 2, π, π ) = 
 0 0
p −1
√1

p  p q p


− 1 p −1
0 0 √
p p

with CS p |c2 , c1 ⟩
s
1 p−1
CS p |01⟩ = |01⟩, CS p |11⟩ = √ · |10⟩ + · |11⟩.
p p

The control qubit c1 1 indicates the split of the qubit c2 . Since the control qubit c1 = 1
is entangled with the memory register we create the memory branch (c2 = 0) with √1p ·
q
|memory; 01⟩ and processing branch (c2 = 1) p−p 1 · |memory; 11⟩ by the split operation on
the preceding processing branch. A new pattern is stored in the generated memory register of
the new generated memory branch. We repeat the procedure until we arrive in the final state
m
1
|ψ⟩ = √
m
∑ |memory; 0, 0; 0 · · · 0⟩ j
j =1
Entropy 2024, 26, 905 3 of 13

representing the binary patterns in superposition of m basis states. In Figure 1 we indicate


a circuit for the preparation of the three states |01⟩, |10⟩, |11⟩.

Figure 1. In this example, we store three binary patterns, |01⟩, |10⟩, |11⟩. After applying the circuit
the resulting state is √1 |0, 0; 1, 0; 0, 0⟩ + √1 |0, 1; 0, 0; 0, 0⟩ + √1 |1, 0; 0, 0; 0, 0⟩. For more details see [5].
3 3 3
The dashed lines are visual indicators of the grouping of a circuit sections.

We can apply Grover’s algorithm with a quadratic speed-up of m, since the repre-
sented m basis states are known to us. The constructed Grover operator amplifies only the
m basis states and the n − m states with the amplitudes zero are unchanged.
Alternatively, we could entangle the index qubits that are in the superposition with
the binary patterns.
| patternm ⟩, | patternm−1 ⟩, · · · | pattern1 ⟩.
To store m binary patterns we entangle the index qubits using multi-controlled NOT gates
by generating v = log2 (m) index qubits using v Hadamard gates
m
v v 1
H ⊗ |0⟩ ⊗ = √
m
∑ |index j ⟩.
j =1

and entangle m binary pattern with the index qubits with a resulting superposition
!
m
1
√ ∑ |index j ⟩| pattern j ⟩ .
m j =1
(1)
Entropy 2024, 26, 905 4 of 13

Using an oracle o () we can mark the corresponding index state and un-compute the
entangled patterns resulting in a state
!
m
1
|ψ⟩ = √ ∑ (−1)
m j =1
o (index j )
· |index j ⟩|0 · · · 0⟩ . (2)


We can now apply Grover’s algorithm with a quadratic speed-up of m to the index states
that point to our binary patterns.

2.1. Input Destruction Problem


The naïve assumption that the speed-up is quadratic is not realistic due to the input
destruction problem (ID problem) [6–9]:
• The input (reading) problem: The quantum state is initialized by state preparation √
of m binary patterns. Although the existing quantum algorithm requires only m
steps and is faster than the classical algorithms, m data points must be read—prepared.
Hence, the complexity of the algorithm does not decrease.
• The destruction problem: We are required to read m data points and are allowed to
query only once because of the collapse during the measurement (destruction).
Additionally, the output quantum state in many quantum machine learning algorithms
must be fully read, but measuring the state collapses it to a single value, requiring many
measurements to interpret the full state.

2.2. Quantum Random Access Memory


To avoid input destruction problem [10] quantum random access memory (qRAM) [11]
was proposed. A qRAM access basis states by copy operation [11]. A register |i ⟩ is queried
and the ith binary patter is loaded into the second register

|i ⟩|0⟩ → |i ⟩| xi ⟩, (3)

with | xi ⟩ being a basis state representing a binary vector. The query operation can be
executed in parallel with

1 m 1 m
√ ∑ |i ⟩|0⟩ → √ ∑ |i ⟩| xi ⟩, (4)
m i =1 m i =1

with the time complexity ignoring the preparation cost of (due to the input problem) is
O(log(m)). The qRAM suffers from the input destruction problem. Its usage leads to a
circular argument.

3. Amplitude Encoding and the HHL Algorithm


Amplitude encoding encodes data into the amplitudes ωi of a quantum state

N
|ψ⟩ = ∑ ωi · | x ⟩. (5)
i =1
Entropy 2024, 26, 905 5 of 13

For example, a complex normalized vector x


 √ 
√0.03
 0.07
 √


 0.15 
 √ 
 0.05
 
x= √ .

0.1
 √
 

 √0.3
 

√0.2
 
0.1

is represented as
√ √ √ √
|ψ⟩ = 0.03 · |000⟩ + 0.07 · |001⟩ + 0.15 · |010⟩ + 0.05 · |011⟩+
√ √ √ √
+ 0.1 · |100⟩ + 0.3 · |101⟩ + 0.2 · |110⟩ + 0.1 · |111⟩
by a top-down divide strategy using parametrized rotation gates with a linear complexity.
Amplitude coding can only represent normalized vectors and also suffers from the input
destruction problem.

3.1. Quantum Random Access Memory for Amplitude Coding


An operation that would produce a copy of an arbitrary quantum state such as |ψ⟩ is
not possible; we cannot copy non-basis states because of the linearity of quantum mechanics.
However, we can, to some extent, simulate the copy of non-basis states using quantum
random access memories (qRAM) as proposed by [9]. The resulting complexity would be
O(n log(m)) where n is the dimension of the resulting superposition vector [9]. We divide
the binary vector of the dimension 2m into v substrings; each substring codei represents a
real number by fractional representation (binary representation of a real number)

1 n
|code1 code2 · · · coden ⟩ √ ∑ |i ⟩. (6)
n i =1

with n = 2m /v and a fractional real number αi that is smaller than one

codei = αi < 1.

We add an auxiliary state |0⊗n ⟩

1 n 1 n
|code1 · · · coden ⟩ √ ∑ |i ⟩ → |code1 · · · coden ⟩ √ ∑ |i ⟩|0⊗n ⟩ (7)
n i =1 n i =1

For each αi we preform a controlled rotation R(αi )


 q 
2 2
C · α i |1⟩ + 1 − C · α i |0⟩

and measure. By measuring the corresponding auxiliary register with the result |1⟩ we
know that the resulting state is correct. However, if we measure in auxiliary register |0⟩,
we have to repeat the whole procedure. The success rate of measuring n |1⟩ is very low
(0.5n ) and converges to zero for large n. For large n it is simply not feasible. If we succeed,
the resulting state would be
n
∑ α i | i ⟩. (8)
i =1
Entropy 2024, 26, 905 6 of 13

The described routine is non reversible since it is based on measurement. If the probability
of success was not very low, the complexity reading m vectors of dimension n would be
O(n log(m)) ignoring the preparation costs, compared to O(n · m) on a classical RAM.
However, the qRAM for amplitude coding is not feasible and does not solve the input
destruction problem.

3.2. Quantum Algorithm for Linear Systems of Equations


Systems of linear equations can be solved by Gauss elimination with O(n3 ). The
approximate solution for a sparse matrix via conjugate gradient descent requires much
lower costs, Õ(n) [12].
By ignoring certain constraints, the quantum algorithm for linear systems of equations
on a quantum computer is exponentially faster for sparse matrices than any algorithm that
solves linear systems on the classical computer [10]. It is based on amplitude coding and is
called the HHL algorithm according to its inventors Aram Harrow, Avinatan Hassidim,
and Seth Lloyd. For an invertible complex matrix n × n A and a complex vector b

A·x = b (9)

we want to find x. The following constraints are present.


• The vectors |b⟩ and | x ⟩ represented by log2 n qubits have a length of one in the l2
norm with
∑in=1 bi |i ⟩
|b⟩ = (10)
∥ ∑in=1 bi |i ⟩∥
and
∑in=1 xi |i ⟩
|x⟩ = . (11)
∥ ∑in=1 xi |i ⟩∥
• |b⟩ has to be prepared efficiently with the cost no bigger than log(n).
• The matrix A is sparse.
• For the output we are interested in the global properties of | x ⟩ rather than the coeffi-
cients xi .
If A is Hermitian A∗ = A (for real matrix A T = A) then A can be represented by the
spectral decomposition as

A = λ1 · |u1 ⟩⟨u1 | + λ2 · |u2 ⟩⟨u2 | + · · · + λn · |un ⟩⟨un |. (12)

and
1 1 1
A −1 = · |u1 ⟩⟨u1 | + · |u2 ⟩⟨u2 | + · · · + · |un ⟩⟨un |. (13)
λ1 λ2 λn
It follows
1
A −1 · | u j ⟩ = |u ⟩ (14)
λj j
and writing |b⟩ as a linear combination of the eigenvectors of A

|b⟩ = ∑ |u j ⟩⟨u j |b⟩ (15)


j

leads to
A · |b⟩ = ∑ λ j |u j ⟩⟨u j |b⟩ (16)
j

and
| x ⟩ = A −1 · | b ⟩ = ∑ λ − 1
j | u j ⟩⟨ u j | b ⟩. (17)
j
Entropy 2024, 26, 905 7 of 13

We estimate the eigenvalues using Kitaev’s phase estimation algorithm. We then estimate
the unknown eigenvalue e2·π ·i·θ j . If we apply U to |u j ⟩ we obtain

U · | u ⟩ = e 2· π · i · θ j · | u j ⟩ = e i · λ j · | u j ⟩. (18)

This representation is similar to the evolutionary operator Ut = e−i·t· H for t = 1 and


H := A is the Hamiltonian operator. The process of implementing a given Hamiltonian
evolution on a quantum computer is called Hamiltonian simulation [13]. The challenge
in Hamiltonian simulation is due to the fact that the application of matrix exponentials is
computationally expensive [4]. The dimension of the Hilbert space grows exponentially
with the number of qubits, and thus any operator will be of exponential dimension. The
computation of the matrix exponential is difficult, and this is still a topic of considerable
current research. Only for a sparse hermitian matrix H, the Hamiltonian simulation can
be implemented efficiently. We do not need to know the eigenvector |u j ⟩ of U. Since a
quantum state |b⟩ can be decomposed into an orthogonal basis

|b⟩ = ∑ |u j ⟩⟨u j |b⟩ = ∑ β j |u j ⟩. (19)


j j

After applying Kitaev’s phase estimation algorithm to U that we estimated by the Hamilto-
nian simulation for each value j the values λ̃k| j approximate the true value λ j . For simplicity,
we assume
n T −1 n
∑ β j ∑ αk| j · |λ̃k| j |u j ⟩ ≈ ∑ β j · |λ̃ j ⟩|u j ⟩. (20)
j =1 k =0 j =1

We have to measure the corresponding eigenvalues, so that we would be able to define a


circuit that performs the conditioned rotation. To conduct the conditional rotation we add
an auxiliary state |0⟩
n
∑ β j · |λ̃ j ⟩|u j ⟩|0⟩ (21)
j =1

and perform the conditioned rotation on the auxiliary state |0⟩ by the operator R
 
cos α − sin α
R= . (22)
sin α cos α

with the relation  


C
α = arccos (23)
λ̃
with C being a constant of normalization. Each eigenvalue indicates a special rotation
n     
∑ β j · |λ̃ j ⟩|u j ⟩ R λ̃−
j
1
| 0 ⟩) =
j =1
 v  
n u 2
C C
∑  β j · |λ̃ j ⟩|u j ⟩ λ̃ |1⟩ + t1 − λ̃2 |0⟩.
u
(24)
j =1 j j

We un-compute the phase estimation procedure, resulting in the state


  v 
n u 2
C C
|0⟩ ∑  β j · | u j ⟩  |1⟩ + t1 − 2 |0⟩   =
u
j =1 λ̃ j λ̃ j

 v 
n u 2
C
βj
|0⟩ ∑ C · |u j ⟩|1⟩ + t1 − 2 · β j |u j ⟩|0⟩.
u
(25)
j =1 λ̃ j λ̃ j
Entropy 2024, 26, 905 8 of 13

By measuring the auxiliary qubit with the result 0 we obtain


v 
n u 2
C
|0⟩ ∑  t1 − 2 · β j | u j ⟩ 
u
(26)
j =1 λ̃ j

and with the result 1


!
n βj
C · |0⟩ ∑ |u j ⟩ = C · |0⟩ A−1 |b⟩ ≈ C · |0⟩| x ⟩. (27)
j =1 λ̃ j

We have to select the outcome of the measurement 1


n βj
| x ⟩ = A −1 | b ⟩ = ∑ λ̃ |u j ⟩ (28)
j =1 j

which requires several measurements. After the preceding measurements we cannot obtain
the solution | x ⟩ efficiently. Obtaining the required coefficient xi from | x ⟩ would require at
least n measurements, so the complexity of the algorithm would be O(n) which is the same
cost for an approximate solution for a sparse matrix via conjugate gradient descent on a
classical computer.
For example,
the solution to the problem

− 31
   
1 1
A= , b= .
− 13 1 0

is represented by
9 3 9
   
−1 8 8 8
A = 3 9 , x= 3 .
8 8 8
with the normalized vector being
 
x 0.948683
xn = = .
∥x∥ 0.316228

We represent  
1
| b ⟩ = |0⟩ = b =
0
with
|0⟩ − |1⟩ |0⟩ + |1⟩
| u1 ⟩ = √ , | u2 ⟩ = √
2 2
and with
1 1
| b ⟩ = ∑ β j | u j ⟩ = |0⟩ = √ · | u1 ⟩ + √ · | u2 ⟩
j 2 2

1 |0⟩ − |1⟩ 1 |0⟩ + |1⟩


|b⟩ = √ · √ +√ · √ = |0⟩.
2 2 2 2
We perform conditioned rotation on the auxiliary state |0⟩ by
   
cos α
2 − sin α2
RY ( α ) = α
sin 2 cos α2
Entropy 2024, 26, 905 9 of 13

with measured two control qubits being |10⟩ representing λ̃1 = 2 and |01⟩ representing
λ̃2 = 1,    
1 1 π
α1 = 2 · arccos = 2 · arccos = (29)
λ̃1 2 3
   
1 1
α2 = 2 · arccos ) = 2 · arccos =π (30)
λ̃2 1
The measured estimated probability value (requiring several measurements) of the HHL
simulation are 0.562 and 0.0622 resulting in
 0.56 
x2m = 0.562+0.0622
0.0622
0.562+0.062

with    
0.948869 0.948683
xm = ≈ xn = .
0.31567 0.316228

4. Quantum Kernels
To avoid the input destruction problem quantum kernels were proposed. A quantum
computer can estimate a quantum kernel and the estimate can be used by a kernel method
on a classical computer, [4]. Using a quantum computer results in an exponential advantage
in evaluating inner products, allowing us to estimate the quantum kernel directly in the
higher dimensional space by a function ϕ(x) with

k(x, y) = |⟨ϕ(x)|ϕ(y)⟩|2 . (31)

For large high-dimensional space, such a procedure is not tractable on a classic com-
puter [14]. However, classically, we do not compute the function ϕ(x); instead we compute
inner products specified by the kernel k(x, y)

k(x, y) = Φ T (x)Φ(y) = ⟨Φ(x)|Φ(y)⟩. (32)

and the feature space could be of infinite dimensionality



k (x, y) = ⟨Φ(x)|Φ(y)⟩ = ∑ ϕ ( xi ) · ϕ ( x ) . (33)
j =1

For quantum kernels the dimension of the feature space is finite, since we map the vectors
directly with the aid of a quantum computer and do not specify the kernel function. The
classical data are encoded into quantum data by quantum feature maps via a parametrized
quantum circuit [15]. The feature vector defines by m parameters of the parametrized
quantum circuit Uϕ(x)
|ϕ(x)⟩ = Uϕ(x) |0⟩⊗m (34)

with the dimension of ϕ(x) being 2m . If we map for an input state |0⟩⊗m with a parametrized
quantum circuit Uϕ(x) with parameters that are defined by x and un-compute it by Uϕ† (x) ,
the inverse of the parametrized quantum circuit Uϕ(x) , then the probability of measuring
the state |0⟩⊗m is one. If we parametrize the quantum circuit U by x ( Uϕ(x) ) and the inverse
of the parametrized quantum U † by y (Uϕ† (y) ) and if x and y are similar, the probability of
measuring |0⟩⊗m for the input |0⟩⊗m

Uϕ† (y) Uϕ(x) |0⊗m ⟩ (35)


Entropy 2024, 26, 905 10 of 13

should be near 1. If x and y differ a lot, this probability is smaller. The quantum kernel is
represented after measurement as

k (x, y) = |⟨ϕ(x)|ϕ(y)⟩|2 = |⟨0⊗m |Uϕ† (y) |Uϕ(x) |0⊗m ⟩|2 (36)

We have to measure the final state several times and record the number of |0⊗m ⟩ and
estimate the value k (x, y). The parametrized quantum circuit is based on superposition and
entanglement and rotation gates that map the feature vectors in a periodic feature space
resulting in periodic receptive fields with a fixed center. Quantum kernels can achieve
lower training error on real data. However, this improvement leads to poor generalization
on the test set and classical models can outperform quantum models [16] due to the periodic
receptive fields with a fixed center, see Figure 2.

(a) (b)

(c) (d)
Figure 2. (a) RBF kernel with the center y = (2 · π, 2 · π ) T . (b) RBF kernel with the center y = (π, π ) T .
(c) Quantum kernel with the center y = (2 · π, 2 · π ) T . (d) Quantum kernel with the center y = (π, π ) T .
For the RBF kernel the position in the space defined its center; this is not the case with the quantum
kernel, the center of which remains fixed; instead its wave distribution changes. In the contour plot
the third dimension is indicated by the color, high values are indicated by white to yellow color, low
values by blue color.

Because of this it is doubtful if quantum kennels offer any advantage over classical
kernels of real-world applications.

5. Variational Approaches
Variational approaches are seen as one of the most promising direction in quantum
machine learning [16]. They do not suffer from the input destruction problem, and have no
limitations from the HHL algorithm or the failure of generalization of the quantum kernels.
Variational approaches iteratively update a parameterized quantum trial solution also
called ansatz (from German ansatz = approach) using a classical optimization algorithm.
The parameterized quantum trial solution is represented by a parametrized quantum circuit
in the same way as a quantum kernel.
The variational approach can be used for binary classifier with two classes represented
by the target values 0 and 1. With the input data vectors xk of dimension m and the binary
output labels tk with a training set

D = {(x1 , t1 ), (x2 , t2 ), · · · ,(x N , t N )}, tk ∈ {0, 1}.

A parametrized quantum circuit Uϕ(xk ) with m parameters encodes each input data vector
xk of the dimension m [4]
Uϕ(x)k |0⟩⊗m . (37)
Entropy 2024, 26, 905 11 of 13

Additionally, a variational quantum circuit represents the free parameter w

UW (w) (38)

that will adapt during training by using an optimizer on a classical computer, leading to
the state
|ψ(xk , w)⟩ = UW (w) · Uϕ(x)k |0⟩⊗m . (39)
The measured state |ψ(xk , w)⟩ by a basis state |qm · · · q1 q0 ⟩ representing a binary string, see
Figure 3. The measured binary string represents a parity function. A Boolean function
whose value is one if and only if the input vector has an odd number of ones represents a
parity function. For each input data vector xk we determine the output function ok ∈ {0, 1}
and perform an adaptation of the parameters w of the variational quantum circuit UW (w)
using an optimizer on a classical computer that minimizes the loss function between the
predicted values represented by the parity function of the measured basis state and the tar-
get values. The optimizer approximates a stochastic gradient descent by the Simultaneous
Perturbation Stochastic Approximation (SPSA). SPSA requires only two measurements of
the loss function, regardless of the dimension of the optimization problem [17] to estimate
the stochastic gradient.

Figure 3. We use the parameterized circuit Uϕ(x) = ZZFeatureMap with repetition for the input of
dimension two mapping it into 22 dimensional space. For UW (w) we use the TwoLocal circuit. It is a
parameterized circuit consisting of alternating rotation layers and entanglement layers. An input
vector xk defines the circuit Uϕ(xk ) . We determine the output function ok ∈ {0, 1}. We minimize the
loss function by the SPSA optimizer on a classical computer resulting in new parameter w that defines
the adopted variational quantum circuit UW (w) . We repeat the process using the adapted variational
circuit for the next raining pattern until the error represented by the loss function is minimal. (The
dashed lines are visual indicators of the grouping of a circuit sections).

The advantages of classical deep learning models cannot be met by variational algo-
rithms, like the principle of hierarchical organization or the representation of big training
sets or regularization. The idea of hierarchical structures is based on the decomposition of
a hierarchy into simpler parts leading to a more efficient way of representing information.
This principle appears in nature; for example, the structure of the matter itself is hierarchi-
cally organized by elementary particles, atomic nuclei, atoms, and molecules [18]. The deep
learning approach learns the hierarchical structure from the training data, leading to a high-
level abstractions in data by architectures composed of multiple nonlinear transformations
using gradient descent through error back-propagation. The back-propagation algorithm
leads to a hierarchy from low-level structures to high-level structures, as demonstrated
by natural complexity [19–22]. Variational quantum circuits cannot propagate the error
backward, since this operation would require the determination of the activity of each
layer for the classical optimizer; this operation can only be performed by measurement,
which would lead to collapse. By increasing the number of layers in deep learning, we can
increase the number of parameters. By doing so, we can add enough degrees of freedom to
model large training sets. This is extremely helpful since nowadays, for specific tasks, a
Entropy 2024, 26, 905 12 of 13

really large amount is collected. Ideally, to overcome overfitting, one has to use a model that
has the right capacity. However, this task is difficult and costly since it involves searching
through many different architectures. Many experiments with different number of neurons
and hidden layers have to be conducted. Using an over-parameterized deep learning
network, one can constrain it to the right complexity through regularization. The search for
the correct model complexity can be conducted efficiently through empirical experiments.
These advantages leading to the success of deep learning cannot be meet by variational
algorithms.

6. Conclusions
The input destruction problem has not yet been solved, and theoretical speed-ups
are usually analyzed by ignoring the input destruction problem. Other constraints are the
normalized representation of vectors by amplitudes and quantum kernels with periodic
receptive fields. We arrive at a dilemma: should we ignore or marginalize those constraints
or not? Until now, no theoretical advantage over classical algorithms on real data was
shown. It is questionable if the input destruction problem will be ever solved and if a
qRAM is possible. Usually the constraints are ignored or marginalized.
Instead, we should determine if quantum machine learning (without the discussed
constraints) can solve any problems more efficiently than a classical computer. Until now,
such problems do not exist and we are far away from showing how to use quantum machine
learning algorithms for real-world data. Are there any problems for which a quantum
computer is more useful than a classical computer? If we do not answer these questions
and overestimate the power of quantum machine, a quantum machine learning winter will
follow in analogy to the AI winter.
What could be a possible answer? We can only speculate that in the future real
quantum data will be generated through quantum physical experiments or that some
new physical breakthroughs will lead to new efficient methods for the state preparation.
Putting the speculation aside, quantum symbolic AI algorithms offer a better alternative for
successful applications. This is because they avoid the input problem, they do not represent
large training data by quantum states. Real-world applications, like the simulation of
chemical reactions and optimization problems, should be investigated. The simulation of
chemical reactions is based on Hamiltonian simulation together with variational quantum
Eigensolvers [23]. Variational quantum Eigensolvers are based on a variational algorithm
that estimates the ground state of a system [4,23]. Additionally, a promising direction is
indicated by the Boltzmann machine [24–26] and deep belief networks [27,28] rather than
the back-propagation algorithm.

Funding: This research received no external funding


Data Availability Statement: The original contributions presented in the study are included in the
article, further inquiries can be directed to the corresponding author.
Conflicts of Interest: The author declares no conflicts of interest.

References
1. Ventura, D.; Martinez, T. Quantum associative memory with exponential capacity. In Proceedings of the Neural Networks
Proceedings, 1998. IEEE World Congress on Computational Intelligence, Arlington, VA, USA, 24–26 August 1988; Volume 1,
pp. 509–513.
2. Ventura, D.; Martinez, T. Quantum associative memory. Inf. Sci. 2000, 124, 273–296.
3. Trugenberger, C.A. Probabilistic Quantum Memories. Phys. Rev. Lett. 2001, 87, 067901.
4. Schuld, M.; Petruccione, F. Supervised Learning with Quantum Computers; Springer: Berlin/Heidelberg, Germany, 2018.
5. Wichert, A. Quantum Artificial Intelligence with Qiskit; CRC Press: Boca Raton, FL, USA, 2024.
6. Wittek, P. Quantum Machine Learning, What Quantum Computing Means to Data Mining; Elsevier Insights: Amsterdam, The
Netherlands; Academic Press: Cambridge, MA, USA, 2014.
7. Aïmeur, E.; Brassard, B.; Gambs, S. Quantum speed-up for unsupervised learning. Mach. Learn. 2013, 90, 261–287.
8. Aaronson, S. Quantum Machine Learning Algorithms: Read the Fine Print. Nat. Phys. 2015, 11, 291–293.
Entropy 2024, 26, 905 13 of 13

9. Wichert, A. Principles of Quantum Artificial Intelligence: Quantum Problem Solving and Machine Learning, 2nd ed.; World Scientific:
Singapore, 2020.
10. Harrow, A.; Hassidim, A.; Lloyd, S. Quantum algorithm for solving linear systems of equations. Phys. Rev. Lett. 2009, 103, 150502.
11. Giovannetti, V.; Lloyd, S.; .; Maccone, L. Quantum Random Access Memory. Phys. Rev. Lett. 2008, 100, 160501.
12. Reif, J.H. Efficient Approximate Solution of SparseLinear Systems. Comput. Math. Appl. 1998, 36, 37–58.
13. Lloyd, S. Universal Quantum Simulators. Science 1996, 273, 1073–1078.
14. Schuld, M.; Killoran, N. Quantum Machine Learning in Feature Hilbert Spaces. Phys. Rev. Lett. 2019, 122, 040504.
15. Havlicek, V.; Corcoles, A.D.; Temme, K.; Harrow, A.W.; Kandala, A.; Chow, J.M.; Gambetta, J.M. Supervised learning with
quantum-enhanced feature spaces. Nature 2019, 567, 210–212.
16. Jerbi, S.; Fiderer, L.J.; Nautrup, H.P.; Kübler, J.M.; Briegel, H.J.; Dunjko, V. Quantum machine learning beyond kernel methods.
Nat. Commun. 2023, 14, 517.
17. Bhatnagar, S.; Prasad, H.L.; Prashanth, L.A. Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods;
Springer: Berlin/Heidelberg, Germany, 2013.
18. Resnikoff, H.L. The Illusion of Reality; Springer-Verlag: Berlin/Heidelberg, Germany, 1989.
19. LeCun, Y.; Bengio, Y. Convolutional Networks for Images, Speech, and Time Series; MIT Press: Cambridge, MA, USA, 1998; pp. 255–258.
20. Riesenhuber, M.; Poggio, T.; models of object recognition in cortex, H. Hierarchical models of object recognition in cortex. Nat.
Neurosci. 1999, 2, 1019–1025.
21. Riesenhuber, M.; Poggio, T. Models of object recognition. Nat. Neurosci. 2000, 3, 1199–1204.
22. Riesenhuber, M.; Poggio, T. Neural mechanisms of object recognition. Curr. Opin. Neurobiol. 2002, 12, 162–168.
23. Hidary, J.D. Quantum Computing: An Applied Approach; Springer: Berlin/Heidelberg, Germany, 2019.
24. Hinton, G.E.; Sejnowski, T.J. Optimal perceptual inference. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, Washington, DC, USA, 19–23 June 1983; pp. 448–453.
25. Ackley, D.H.; E., H.G.; Sejnowski, T.J. A learning algorithm for Boltzmann machines. Cogn. Sci. 1985, 9, 147–169.
26. Hinton, G.E.; Sejnowski, T.J. Learning and Relearning in Boltzmann Machines. In Parallel Distributed Processing: Explorations in the
Microstructure of Cognition. Volume 1: Foundations; Rumelhart, D.E., McClelland, J.L., Eds.; The MIT Press: Cambridge, MA, USA,
1986; pp. 282–317.
27. Smolensky, P. Information Processing in Dynamical Systems: Foundations of Harmony Theory. In Parallel Distributed Processing:
Explorations in the Microstructure of Cognition. Volume 1: Foundations; Rumelhart, D.E., McClelland, J.L., Eds.; The MIT Press:
Cambridge, MA, USA, 1986; pp. 194–281.
28. Salakhutdinov, R.; Hinton, G. An Efficient Learning Procedure for Deep Boltzmann Machines. Neural Comput. 2012, 24, 1967–2006.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

View publication stats

You might also like