0% found this document useful (0 votes)
69 views23 pages

A Quantum Model For Multilayer Perceptron

The document proposes a new quantum model for studying multilayer perceptrons. It introduces a technique called parallel swap test to efficiently simulate the nonlinear structure of multilayers in quantum computers. This allows output computation and learning algorithms to achieve quadratic or exponential speedups over classical versions. As an example, an exponential fast learning algorithm for Hopfield networks based on Hebb's rule is presented.

Uploaded by

Abel Demelash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views23 pages

A Quantum Model For Multilayer Perceptron

The document proposes a new quantum model for studying multilayer perceptrons. It introduces a technique called parallel swap test to efficiently simulate the nonlinear structure of multilayers in quantum computers. This allows output computation and learning algorithms to achieve quadratic or exponential speedups over classical versions. As an example, an exponential fast learning algorithm for Hopfield networks based on Hebb's rule is presented.

Uploaded by

Abel Demelash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

A Quantum Model for Multilayer Perceptron

Changpeng Shao
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing
arXiv:1808.10561v2 [quant-ph] 8 Sep 2018

100190, China
E-mail: [email protected]

Abstract. Multilayer perceptron is the most common used class of feed-forward


artificial neural network. It contains many applications in diverse fields such as speech
recognition, image recognition, and machine translation software. To cater for the fast
development of quantum machine learning, in this paper, we propose a new model to
study multilayer perceptron in quantum computer. This contains the tasks to prepare
the quantum state of the output signal in each layer and to establish the quantum
version of learning algorithm about the weights in each layer. We will show that
the corresponding quantum versions can achieve at least quadratic speedup or even
exponential speedup over the classical algorithms. This provide us an efficient method
to study multilayer perceptron and its applications in machine learning in quantum
computer. Finally, as an inspiration, an exponential fast learning algorithm (based on
Hebb’s learning rule) of Hopfield network will be proposed.

Keywords: Artificial neural networks, multilayer perceptron, Hopfield neural network,


quantum computing, quantum algorithm, quantum machine learning.

1. Introduction

Inspired by biological neural networks, artificial neural networks (ANNs) are massively
parallel computing models consisting of an extremely large number of simple processors
with many interconnections. ANNs are highly successful methods to study machine
learning. Researchers from different scientific disciplines are designing different ANN
models to solve various problems in pattern recognition, clustering, approximation,
prediction, optimization, control and so on [6, 11, 17, 18, 19, 28, 36]. Hence ANNs
are of great interest for quantum adaptation. Quantum neural networks (QNNs) are
models combining the powerful features of quantum computing (such as superposition
and entanglement) and ANNs (such as parallel computing). However, the nonlinear
dynamics of ANNs are very different from unitary operations (which are linear)
of quantum computing. To find a meaningful QNN model that integrates both
fields is a highly nontrivial task [29]. Many attempts of QNN models are obtained
[2, 3, 4, 20, 25, 29, 30, 34, 35, 37].
In [2], Altaisky introduced a quantum perceptron which is modelled by the following
P
quantum updating function |y(t)i = F̂ i ŵiy (t)|xi i, where F̂ is an arbitrary unitary
A Quantum Model for Multilayer Perceptron 2

operator and ŵiy (t) is an operator representing the weights. The training rule is
ŵiy (t + 1) = ŵiy (t) + η(|di − |y(t)i)hxi |, where |di in the quantum state of the target
vector, |y(t)i is the output state of the quantum perceptron and η is some pramteter.
The author notes that this training rule is by no means unitary regarding the components
of the weight matrix. So it would fail to preserve the unitary property and so the total
probability of the system.
A large class of ANNs uses binary McCulloch-Pitts neurons [22] in which the neural
cells are assumed to be active or resting. The value of inputs and outputs are {−1, 1}.
So there is a natural connection between the the inputs {−1, 1} and qubits |0i, |1i.
All possible inputs x1 , . . . , xn ∈ {−1, 1} naturally corresponds to a quantum state
P
x1 ,...,xn ∈{−1,1} αx1 ,...,xn |x1 , . . . , xn i, where αx1 ,...,xn refers to the amplitude of |x1 , . . . , xn i.
In [30], Schuld et al. considered the simulation of a perceptron in quantum computer in
this model. They put the signals x1 , . . . , xn ∈ {−1, 1} into qubits |x1 , . . . , xn i, so that the
output of a perceptron under the threshold function can be computed to some precision
by applying quantum phase estimation on a unitary operator, which is generated by
the weights. This model uses n qubits, so the complexity of the method given in [30] is
linear at n. For the introduction of other QNN models, we refer to [29].
In this paper, we give a try to study multilayer perceptron (MLP) in quantum
computer from a different idea, that is setting xi as the amplitude of qubits to generate
1
P
the quantum state |xi = kxk xi |ii, where x = (x1 , . . . , xn ) is the vector of input
signals. This idea has been applied in [23] to study quantum Hopfield neural network.
One apparent advantage is that it only uses O(log n) qubits. So it may have a better
performance than the model used in [30, 31]. Also in the quantum state |xi, the
input signal xi can be any value. So it contains a more general form than the model
considered in [30]. Similarly, we can generate the quantum state |wi of the weight vector
P
w = (w1 , . . . , wn ). The output of MLP can be any nonlinear function of x·w := i xi wi .
However, computing x · w (or hx|wi when we only interested in the sign) is not difficult
in a quantum computer due to swap test [8]. Moreover, any function of this value can
be obtained by adding an ancilla qubit, just like HHL algorithm did [12]. This shows
another advantage of the representation |xi of the input signal vector x.
However, reading the output is just the initial step of studying quantum MLP. A
big challenge of the quantum MLP is the learning algorithm because of the nonlinear
structure of MLP. Learning (or self-learning) ability is one of the most significant features
of MLP. Many applications of MLP can be achieved by suitably constructing the network
first. Then let MLP to achieve the desired results from learning. So establish the learning
algorithm of MLP in quantum computer is necessary and important. In this paper,
we will provide a method to integrate the nonlinear structure of MLP into quantum
computer. At the same time, the good features of MLP (e.g., parallelism) and quantum
computer (e.g., superposition) can be satisfied simultaneously. This provide us a new
model to study MLP and its applications in quantum machine learning.
The integration is achieved by a new technique called parallel swap test proposed in
a previous paper [32] to study matrix multiplication. It is a generalization of swap test
A Quantum Model for Multilayer Perceptron 3

[8], but in a parallel form. Simply speaking, parallel swap test can output the quantum
information in parallel, which is required in the construction of quantum MLP. With
this technique, we will show that the computation of the output signals and the learning
algorithm (online and batch) of weights in quantum MLP can be achieved much faster
(at least quadratic or even exponential faster) than the classical versions. Finally, as an
inspiration, we will study the learning (Hebb’s learning rule) of Hopfield network. This
also achieves an exponential speedup at the number of input neurons over the classical
learning algorithm.
The structure of this paper is as follows: In section 2, we extend the swap test
technique into a parallel form, which will play a central role in the simulation of
quantum MLP. In section 3, we propose a technique to prepare quantum states, which is
exponential fast than previous quantum algorithms. Although this cannot prepare the
quantum states of all vectors efficiently, one can believe that it can solve many practical
problems as fast as desired. In section 4, we focus on the simulation of a perceptron.
The simulation, which include the reading of output data and the back-propagation
learning algorithm, of MLP in quantum computer will be studied in section 5. Finally,
in section 6, we study the learning algorithm of Hopfield neural network in a quantum
computer.
Notation. We will use bold letters x, y, z, . . . to denote vectors and italic letters
|xi, |yi, |zi, . . . to denote their quantum states. The norm k · k always refers to the
2-norm of vectors.

2. Swap test in parallel

Swap test was first proposed in [8] as an application of quantum phase estimation
algorithm and Grover searching, which can be used to estimate the probabilities or
amplitudes of certain desired quantum states. It plays an important role in many
quantum machine learning algorithms to estimate the inner product of two quantum
states. In the following, we first briefly review the underlying problem swap test
considers and the basic procedures to solve it. Then, we review the generalized form of
swap test considered in a previous work [32] to study matrix multiplication in quantum
computer. This will play a central role in our construction of the model of multilayer
perceptron in a quantum computer.
Let
|φi = sin θ|0i|ui + cos θ|1i|vi (1)
be a unknown quantum state that can be prepared in time O(Tin ), where |ui, |vi are
normalized quantum states and θ is a unknown angle parameter. The problem is how
to estimate θ in quantum computer to precision  with a high success probability.
Suppose that |φi comes from some other quantum algorithms, which means there is
a given unitary U , which can be implemented in time O(Tin ), such that |φi = U |0i. Let Z
be the 2-dimensional unitary transformation that maps |0i to −|0i and |1i to |1i, which is
A Quantum Model for Multilayer Perceptron 4

usually called Pauli-Z matrix. Denote G = (2|φihφ|−I)(Z⊗I) = U (2|0ih0|−I)U † (Z⊗I),


which is a rotation similar to the one used in Grover’s algorithm. Then
!
cos 2θ sin 2θ
G=
− sin 2θ cos 2θ
under the basis {|0i|ui, |1i|vi}. The eigenvalues of G are e±i2θ and the corresponding
eigenvectors are |w± i = √12 (|0i|ui ± i|1i|vi). Note that |φi = − √i2 (eiθ |w+ i − e−iθ |w− i).
So performing quantum phase estimation algorithm on G with initial state |0in |φi for
some n = O(log 1/δ). We will get an good approximate of the following state
i h i
− √ eiθ |yi|w+ i − e−iθ | − yi|w− i , (2)
2
where y ∈ Z2n satisfies |θ − yπ/2n | ≤ . The time complexity of the above procedure is
O(Tin /). Perform a measurement on (2), we will get an  approximate of θ.
Now let |xi, |yi be two real quantum states that can be prepared in time O(Tin ).
The requirement of quantum states to be real can be removed. Then the above method
provides us an quantum algorithm to estimate hx|yi to accuracy  in time O(Tin /).
Actually, we just need to consider the state
1 1
|φi = √ (|+i|xi + |−i|yi) = (|0i(|xi + |yi) + |1i(|xi + |yi)). (3)
2 2
The probability of p |0i (resp. |1i) is (1 + hx|yi)/2
p (resp. (1 − hx|yi)/2). So we
can set sin θ = (1 + hx|yi)/2 and cos θ = (1 − hx|yi)/2. The quantum state
|φi can be rewritten in the form (1) with |ui, |vi correspond to the normalization of
|xi + |yi, |xi − |yi. Therefore, the inner product hx|yi can be evaluated in time O(Tin /)
to precision . Note that if |xi, |yi are complex quantum states, then the probability of
|0i (resp. |1i) is (1 + Rehx|yi)/2 (resp. (1 − Rehx|yi)/2). So we can only get the value
Rehx|yi by the above method. However, the image part of hx|yi can be computed by
considering the inner product of |xi with i|yi. Concluding this, we get the following
result, which is known as swap test [8]
Proposition 1. Let |xi, |yi be two quantum states, which can be prepared in time O(Tin ),
then hx|yi can be estimated to precision  in time O(Tin /).
As one can see swap test only returns the result about the inner product. One
problem is that if there are N such inner products we want to estimate, then we should
apply swap test at least O(N ) many times. This will inevitably increase the whole
complexity. In the following, we will extend swap test into a parallel form that can help
us estimate all the inner products in parallel.
Let f (y) be some functions such that f (y) = f (−y) (i.e., f is an even function),
then from (2), we can get
|f (θ)i|φi, (4)
by adding a register to store f (θ) and undoing the quantum phase estimation. This is
a quantum state that we want to further make use of the quantum information about θ
A Quantum Model for Multilayer Perceptron 5

instead of outputting. Moreover, from (3) and (4), we actually can obtain the following
quantum state
1   E
√ |0i|xi + |1i|yi f (hx|yi) ,
2
for any function f in that cosine function is even. The case for complex quantum states
can be solved similarly by considering hx|yi and hx|i|yi in parallel. Concluding the
above analysis, we have
Proposition 2. Let |xi, |yi be two quantum states, which can be prepared in time O(Tin ).
Let f be any function. Then there is a quantum algorithm runs in time O(Tin /) to
achieve
1 1
√ (|0i|xi + |1i|yi) 7→ √ (|0i|xi + |1i|yi)|f (s)i, (5)
2 2
where |hx|yi − s| ≤ .
From proposition 2, it is easy to get the following results
Theorem 1 (Parallel Swap Test). Given 2N quantum states |u0 i, |v0 i, . . . , |uN −1 i, |vN −1 i
which can be prepared in time O(Tin ) and N functions f0 , . . . , fN −1 . Then there is a
quantum algorithm with runtime O(Tin /) to get the following quantum state
N −1
1 X
√ |ji|fj (sj )i, (6)
N j=0
where |sj − huj |vj i| ≤ .

Proof. The result (6) can be obtained by operating (5) in parallel because of the control
P −1
qubit. More precisely, construct the quantum state √1N N j=0 |ji first. Then view |ji
1
as a control qubit to prepare |φj i := √2 (|uj i|0i + |vj i|1i). So we can efficiently get a
P −1
quantum state in the form √1N N j=0 |ji|φj i. For each |φj i, perform the transformation
P −1
(5) to get the information of fj (huj |vj i). So we have √1N N j=0 |ji|φj i|fj (huj |vj i)i. As
discussed above, this is achieved by performing quantum phase estimation on a unitary
P −1
operator Gj , which depends on |φj i. So the quantum state √1N N |ji|φj i|fj (huj |vj i)i
P j=0
is obtained by performing the quantum phase estimation on j |jihj| ⊗ Gj . Finally, we
will get the desired state (6) by undoing the preparation of |φj i.
P
Corollary 1. For any given quantum state j αj |ji, which can be prepared in time
P
O(Tin ) and any function f , we can obtain j αj |ji|f (α̃j )i in time O(Tin /), where
|αj − α̃j | ≤ .
P
Proof. Denote |φi = j αj |ji as the given quantum state, then αj = hj|φi. The desired
result can be obtained in a similar way as the proof of Theorem 1.
A Quantum Model for Multilayer Perceptron 6

3. Preparation of quantum states

For further application in the quantum model of multilayer perceptron, we discuss one
method in this section to achieve quantum state preparation. Let x = (x0 , . . . , xm−1 )
1
Pm−1
be any complex vector, then its quantum state |xi = kxk j=0 xj |ji can be prepared
by the following simple procedure (the idea comes from Clader et al. [10]):
Step 1, prepare √1m m−1
P
j=0 |ji.
Step 2, apply control operator to get the component of x, that is to prepare
m−1  
1 X
q
√ 2 2
|ji txj |0i + 1 − t |xj | |1i , (7)
m j=0

where t = 1/ maxj |xj |. The first part contains the information of |xi, and |0i serves

as a distinguish qubit. The complexity to get |xi is O( n maxj |xj |(log m)/kxk) =
O((log n) maxj |xj |/ minj |xj |) when all components of x are nonzero. We can overcome
the case when minj |xj | = 0 by only considering the nonzero components of x. However,
before performing any measurement, the complexity to get (7) is O(log m). Sometimes
the quantum state (7) that contains the information of |xi and the norm kxk is enough
to solve certain problems.
A more efficient quantum algorithm is based on the linear combinations of unitaries
technique (LCU for short), which was first proposed in [32] (also see [33] for more
details). The LCU problem can be stated as: given m complex numbers αj and
m quantum states |vj i, which can be prepared efficiently in time O(Tin ), where
j = 0, 1, . . . , m − 1. Then how to prepare the quantum state |yi proportional to
Pm−1
y = j=0 αj |vj i? And what is the corresponding complexity? The following LCU
idea was used to simulate Hamiltonian [5] and solve linear systems [9].
Pm−1
Set αj = rj eiθj , where rj > 0 is the norm of αj . Denote s = j=0 rj . Define
1
Pm−1 √
the unitary operator S as S|0i = √s j=0 rj |ji. Then |yi can be obtained from

the following procedure: Prepare the initial state √1s m−1
P
j=0 rj |ji|0i by S. Then
√ iθj
conditionally to prepare |vj i according to the qubit |ji, so we get √s m−1
1
P
j=0 rj e |ji|vj i.
m−1
Finally, apply S † on |ji, which yields s |0i j=0 αj |vj i + orthogonal parts. It is easy to
1
P

see that the complexity to obtain |yi equals O((Tin + log m)s/kyk). A direct corollary
of this LCU is
Proposition 3. For any vector x = (x0 , . . . , xm−1 ), its quantum state can be prepared
in time O(κ(x) log m), where κ(x) = maxk |xk |/ mink,xk 6=0 |xk |.

Proof. Assume that all entries of x are nonzero, otherwise it suffices to focus on the
nonzero entries of x. Then to prepare |xi, one just need to choose |vj i = |ji. At this
P
time O(Tin ) = O(1). So the complexity is O((Tin + log m) j |αj |/kyk) = O(κ(x) log m)
since kyk ≥ m minj |xj | and s ≤ m maxj |xj |.

The above result is the same as the result given by Clader et al. [10]. Actually,
based on the LCU given above, the quantum state can be prepared more efficiently.
A Quantum Model for Multilayer Perceptron 7

Proposition 4. Let x = (x p0 , . . . , xm−1 ) be a given vector, then the quantum state of x


can be prepared in time O( log κ(x) log m), where κ(x) = maxk |xk |/ mink,xk 6=0 |xk |.

Proof. For simplicity, we assume that |x0 | = mink,xk 6=0 |xk |. Find the minimal q such
that κ(x) ≤ 2q , so q ≈ log κ(x). For any 1 ≤ j ≤ q, there are several entries of x
such that their absolute values lie in the interval [2j−1 |x0 |, 2j |x0 |). Define yj as the m
dimensional vector by filling these entries into the corresponding positions as them in
x and zero into other positions. Then x = y1 + · · · + yq . For any j, we have κ(yj ) ≤ 2,
so the quantum state |yj i of vector yj can be prepared efficiently in time O(log m) by
proposition 3. We also have |xi = λ1 |y1 i + · · · + λq |yq i, where λj = kyj k/kxk. From
the LCU method given above, the complexity to achieve such p a linear combination to

get |xi equals O((log m) qj=1 kyj k/kxk) = O( q log m) = O( log κ(x) log m), where
P

the first identity is because of the relation q


between 1-norm and 2-norm of vectors, more
Pq √ Pq 2

precisely, it is a result of j=1 kyj k ≤ q j=1 kyj k = qkxk.

Compared with LCU, parallel swap test achieves a similar task except that the
coefficients αj are not given directly. One main idea about parallel swap test we will
frequently use in the following simulation of multilayer perceptron in quantum computer
is that: given 2l quantum states |vj± i, then we can compute the inner product hvj+ |vj− i
in parallel. More precisely, prepare the following quantum state first
l−1
1 X 1
√ |ji ⊗ √ (|vj+ i|0i + |vj− i|1i). (8)
l j=0 2
For each √12 (|vj+ i|0i + |vj− i|1i), there is a unitary operator Gj whose eigenvalues contain
the information about hvj+ |vj− i. So apply quantum phase estimation on j |jihj| ⊗ Gj
P

with the initial state (8), we can get


l−1
1 X 1
√ |ji ⊗ √ (|vj+ i|0i + |vj− i|1i)|hvj+ |vj− ii.
l j=0 2
With this quantum state, we can perform any other desired operations we want
about hvj+ |vj− i, such as apply control operation to put hvj+ |vj− i into the coefficients as
HHL algorithm did
l−1
1 X 1 h q i
√ |ji ⊗ √ (|vj+ i|0i + |vj− i|1i)|hvj+ |vj− ii hvj+ |vj− i|0i + 1 − hvj+ |vj− i2 |1i .
l j=0 2
Undo the quantum phase estimation to remove |hvj+ |vj− ii, then one will obtain
l−1 l−1
1 X + − 1 Xq
√ hvj |vj i|ji|0i + √ 1 − hvj+ |vj− i2 |ji|1i. (9)
l j=0 l j=0
The above simple idea can help us manage hvj+ |vj− i in parallel. More complicate
procedures can also be obtained similarly. The complexity to get (9) is O((log l)/). If
we apply swap test to compute all hvj+ |vj− i separately, then apply LCU to get (9), the
complexity will be O(l/). So in this problem, parallel swap test performs much better.
A Quantum Model for Multilayer Perceptron 8

4. Quantum simulation of Rosenblatt’s perceptron

4.1. Rosenblatt’s perceptron


The perceptron was first invented by Rosenblatt at 1958 [26], which was the first
algorithmically described neural network. The perceptron is also the simplest form of
a neural network used for the (supervised) classification of patterns that are promised
to be linearly separable. The structure of a perceptron is very simple. It consists of
m input neurons with adjustable (synaptic) weights, a bias and an output neuron (see
figure 1 below).

Figure 1. The graph of a perceptron: x1 , . . . , xm are input signals, w1 , . . . , wm are the


P
weights, b is the bias. The output is ϕ( i xi wi + b), where ϕ is a threshold function.

Denote the input and the weight as m + 1 dimensional vectors x = (x0 , x1 , . . . , xm )


and w = (w0 , w1 , . . . , wm ), where x0 = 1, w0 = b. The output is defined by a threshold
function
(
1, if w · x > 0;
y = ϕ(x · w) = (10)
−1, if w · x ≤ 0.
For classification problem, it means y = 1 if x belongs to the first class; y = −1 if x
belongs to the second class. The hyperplane w · x = 0 defines the boundary of the
classification (see figure 2).

Figure 2. Linear separable training samples. There is a hyperplane to divide the two
classes.
A Quantum Model for Multilayer Perceptron 9

Rosenblatt also defines a learning algorithm (or updating rule) to adjust the weight
of the perceptron when it makes wrong decisions. More precisely, given a set of training
samples {(xt , rt ) : t = 0, 1, . . . , d − 1}, where rt is the desired output of xt , that is
(
1, if x belongs to the first class;
rt = (11)
−1, if x belongs to the second class.
Then the updating rule is defined as
wnew = wold + η(rt − y t )xt (12)
by randomly choosing a sample (xt , rt ), where η ∈ [0, 1] is called the learning factor and
y t = ϕ(x · wold ).
Rosenblatt proved that this learning rule converges after a finite number iterations,
which is currently known as perceptron convergence theorem. Simply, from the updating
rule (12), if rt = y t , then there is no updating. If rt > y t , that is rt = 1, y t = −1, then
wnew = wold + 2ηxt . Since rt = 1, we know that xt belongs to the first class. However,
y t = −1 implies that we make the wrong decision to put xt into the second class. So we
should increase the value of y t . From the updated weight wnew , the new output of xt
equals ϕ(xt · wold + 2ηxt · xt ), which is larger than the old output ϕ(xt · wold ). Similar
analysis also holds if rt < y t . More details about perceptron convergence theorem can
be found in [13, Chapter 1]. Such a learning algorithm is also called online learning [1]
in that the training only requires one sample at each step. There are a lot of advantages
of online learning, such as it saves the cost to store the training sample, it is easy to
implement and so on [1, Chapter 11], [13, Chapter 4].

4.2. Simulation in quantum computer


In this subsection, we study the corresponding quantum model of a perceptron, which
include the reading of the output signal and the learning algorithm. Due to the simple
structure of a perceptron, the quantum model is also not complicate. In [30], Schuld
et al. proposed a quantum algorithm to read the output signal by quantum phase
estimation. Actually, get the output of a perceptron is not so difficult in a quantum
computer, since it only needs to estimate the inner product of two vectors. And swap
test can achieve such a goal efficiently as described in section 2. In the following, we
will mainly focus on the Rosenblatt’s learning algorithm of a perceptron.
From the equation (10), the output y only depends on the sign of w · x. So we only
need to focus on the normalized vectors of x and w, that is the quantum states of x
and w. Define
m m
1 X 1 X
|xi = xj |ji, |wi = wj |ji.
kxk j=0 kwk j=0
These two quantum states can be prepared efficiently in quantum computer by the
quantum algorithms given in section 3. By swap test, the output y = ϕ(hx|wi) can
be estimated to precision  in time O((log m)/). For the updating rule (12), it can be
achieved in the following way in a quantum computer:
A Quantum Model for Multilayer Perceptron 10

Algorithm 1: Training a perceptron in quantum computer


Step 1 Randomly choose a sample (xt , rt ).
Step 2 Compute y t = ϕ(x · wold ) by swap test. If rt = y t then go to Step 1, else go to
Step 3.
Step 3 Prepare the quantum state |wnew i of wnew by the following procedure
Step 3.1 Prepare the quantum state √12 (|xt i|0i + |wold i|1i).
Step 3.2 Apply control operator on it to prepare
1 h p i
√ |xt i|0i sη(rt − y t )kxk|0i + 1 − s2 η 2 (rt − y t )2 kxk2 |1i
2
1 old
h
old
p i
+ √ |w i|1i skw k|0i + 1 − s kw k |1i , 2 old 2
2
where s = 1/ max{η|rt − y t |kxt k, kwold k} = 1/ max{2ηkxk, kwold k}.
Step 3.3 Apply Hadamard transformation on the second register to get
s
2
kwnew k|wnew i|0, 0i + orthogonal part.
Step 4 Go to Step 1 until converges.
In Step 3.3, we do not need to perform measurements to get |wnew i exactly. There
are several advantages to perform no measurements in this step:
(1). Perform measurements will increase the complexity of the whole learning al-
gorithm. Since the complexity to get |wnew i is O(max{2ηkxt k, kwold k}(log m)/kwnew k),
which can be very large. If we do not perform measurements, the complexity to get the
quantum state in Step 3.3 is just O(log m).
(2). The updating formula (12) also needs the information of kwold k, which is
already contained in the quantum state obtained in Step 3.3.
(3). We can normalize xt and wold in the initial step, so that s = max{2η, 1} is
small. This will cause no influences in the following iterations. The reasons is that
to compute y t based on the new weight, we can apply swap test to estimate the inner
product between |xt i and the quantum state in Step 3.3. The only influence is the
factor s/2, which is a small constant lie between 1 and 2. It will not affect the output.
(4). Moreover, denote the quantum state in Step 3.3 as |W new i. Then in the next
step of iteration, we do not need to implement Step 3.1-3.3, instead we can implement
Step 3.1’ Prepare the quantum state √ 1
(sη(rt − y t )/2|xt i|0, 0i|0i + |W new i|1i).
1+s2 η 2
Step 3.2’ Apply Hadamard operator on the last qubit, then we have
s h i
p η(rt − y t )|xt i + kwnew k|wnew i |0, 0i + orthogonal part.
2 1 + s2 η 2
p
As we can see, the coefficient is changed into λ2 = λ1 / 1 + 4λ21 η 2 from λ1 = s/2
after another step of iteration. Itpis not hard to show that after n steps of iteration,
the coefficient becomes λn = 1/ 4s−2 + 4(n − 1)η 2 . By the perceptron convergence
theorem, assume that the learning algorithm stops after n steps of iteration, then the
final quantum state has the form λn kwfinal k|wfinal i|0i + orthogonal part. The complexity
to get this state contains two parts:
A Quantum Model for Multilayer Perceptron 11

(1). Prepare the quantum state in Step 3: At the j-th step of iteration, the
complexity is O(j(log m)), since we perform no measurements.
(2). Estimating the value of y t by swap test: At the j-th step of iteration, the
coefficient is λj . So to make the error of estimating y t is small in size , the error chosen
in swap test is /λj . Then the complexity to estimate y t is O(j 3/2 η(log m)/) since the
quantum state preparation costs O(j(log m)) as discussed in (1).
Therefore, the final complexity of the learning algorithm of a perceptron in a
quantum computer equals
O(n3/2 (log m)η/ + n(log m)). (13)
In a classical computer, estimating the inner product of x and w costs O(m). So the
complexity of the learning algorithm of a perceptron after n steps of iteration in a
classical computer is O(mn). Compared to classical learning algorithm, the quantum
algorithm achieves an exponential speedup at m. As a compensation, its dependence
on n is worse than classical algorithm.

5. Quantum simulation of multilayer perceptron

5.1. Multilayer perceptron


Multilayer perceptron refers to a neural network with one or more hidden layers (see
figure 3). It contains Rosenblatt’s perceptron as a special case. Perceptron can only
used to perform the classification task with linear linearly separable patterns. However,
multilayer perceptron can accomplish much more complicate tasks, such as nonlinear
classification, function approximation, speech recognition, image recognition and so on
[13, 19]. Instead of applying threshold function, multilayer perceptron often uses a
nonlinear differentiable activation function. Multilayer perceptron exhibits a high degree
of interconnections, which makes its structure more complicate than perceptrons. And
this is one reason why it can achieve more complicate tasks than a perceptron.

Figure 3. Graph of a multilayer perceptron with one hidden layer.

The most commonly used activation function in multilayer perceptron is the sigmoid
A Quantum Model for Multilayer Perceptron 12

function defined as follows


1
sigmoid(a, β) = . (14)
1 + exp(−aβ)
For any fixed β, the sigmoid function sigmoid(a, β) increases from 0 to 1 as a grows
from −∞ to +∞. When β goes into +∞, the sigmoid function becomes closing to
the threshold function. In the following, we only focus on the activation function
sigmoid(a) := sigmoid(a, 1) with β = 1.
If the input vector is x ∈ Rm , the weight vectors in the hidden layer are
w0 , . . . , wn−1 ∈ Rm , then the n outputs in the hidden layer are yi = sigmoid(x · wi ),
where i = 0, 1, . . . , n − 1. Denote y = (y0 , . . . , yn−1 ). Assume that the weight vectors
in the hidden layer are v0 , . . . , vp−1 ∈ Rn , then the outputs are zj = sigmoid(y · vj ),
where j = 0, 1, . . . , p − 1. Obviously, if p < m, then the multilayer perceptron performs
a dimensionality reduction. It has been shown in [7] that a multilayer perceptron with
one hidden layer can implement principal components analysis, except that the hidden
weights are not the eigenvectors sorted in importance but span the same space as the
principal eigenvectors.
Training a multilayer perceptron is the same as training a perceptron; the only
difference is that now each output is a nonlinear function of the inputs. This makes
the learning algorithm of a multilayer perceptron more complicate than a perceptron.
The learning algorithm in multilayer perceptron is called back-propagation, which was
first discovered by Rumelhart, Hinton and Williams [27] at 1986. The development of
the back-propagation algorithm represented a landmark in neural networks in that it
provided a computationally efficient method for the training of multilayer perceptrons.
The reasoning of back-propagation learning algorithm is based on chain rule of calculus
to compute the derivatives of composite functions. Usually, the learning algorithm for
the weights in the output layer is simple than the weights in the hidden layers.
Assume that there is only one hidden layer. Given a set of training samples
{(x , r ) : t = 0, 1, . . . , d − 1}, where rt ∈ Rp refers to the desired output of xt in
t t

the output layer. The error (called error energy averaged over the training sample, or
the empirical risk) of this network is defined by E = 21 i,t (rti − zti )2 , where zt is the
P

vector generated by zti = sigmoid(yt · vi ) for all i and yt is the vector generated by
yjt = sigmoid(xt · wj ) for all j. The back-propagation learning algorithm is obtained by
gradient descent method [1, 13] as follows: In the gradient descent method, we need to
compute the gradient of the error function with respect to wi and vj . First, we see the
computation of the gradient of E with respect to vj . Direct calculation shows that
∂E ∂ztj
= − t (rtj − ztj ) = − t (rtj − ztj )ztj (1 − ztj )ylt .
P P
∂vjl ∂vjl
So the updating rule of vj is
P  
vjnew = vjold + η t (rtj − ztj )ztj (1 − ztj ) yt . (15)
As for the gradient of E with respect to wi , by chain rule, we have
∂E ∂zt ∂yjt
= − i,t (rti − zti ) it = − i,t (rti − zti )zti (1 − zti )vij yjt (1 − yjt )xtk .
P P
∂wjk ∂yj ∂wjk
A Quantum Model for Multilayer Perceptron 13

So the updating rule of wj is


P P 
wjnew = wjold + η t
t
i (ri − zti )zti (1 − zti )vij yjt (1 − yjt ) xt . (16)
Different choices of the activation function will lead to different updating rules of vj
and wj . In this paper, we only focus on the above two updating rules. All other choices
can be studied similarly. The learning algorithm based on (15) and (16) is called batch
learning, which uses all the sample at an epoch. However, anther type of learning called
online learning only uses on sample at one step. In online learning, arbitrary choose
a sample (xt , rt ), then the error (called total instantaneous error energy) is defined by
E t = 21 i (rti − zti )2 . Similarly analysis shows that the updating rule of online learning
P

are
 
vjnew = vjold + η (rtj − ztj )ztj (1 − ztj ) yt ,
P t  t (17)
wjnew = wjold + η t t t t t
i (ri − zi )zi (1 − zi )vij yj (1 − yj ) x .

5.2. Simulation in quantum computer: reading the output data of each layer
Different from a preceptron, the outputs of each layer and the learning algorithms of a
multilayer perceptron depict strong nonlinear features. While quantum computer only
allows unitary operations, which are linear operations. This is one reason why it is so
hard to integrate the good features of multilayer perceptron and quantum computing
to generate a useful model in quantum computer. However, we will show in this and
the next subsection that it is not so “difficult” as it looks to integrate the nonlinear
structure of multilayer perceptrons into a quantum computer.
In this subsection, we concentrate on the outputs reading of each layer. First, we
show how to get the output vector y of the first layer. Note that its i-th component
equals yi = sigmoid(x·wi ), where i = 0, 1, . . . , n−1. If we compute all the components by
swap test, then the complexity is at least O(n). In the following, we give another method
to construct the quantum data of y. We will use ϕ to denote the sigmoid function for
simplicity in this subsection, although it has been used to denote the threshold function
in Rosenblatt’s perceptron. The procedure of preparing the quantum state of y can be
stated as follows:

Algorithm 2: Reading the output of the hidden layer in quantum computer


Step 1 Prepare √1n n−1
P
i=0 |ii.
Step 2 Apply control operator to prepare the following quantum state in parallel
n−1
1 X 1 h i
√ |ii ⊗ p kxk|xi|0i + kwi k|wi i|1i .
n i=0 kxk2 + kwi k2
Step 3 Apply parallel swap test on the above quantum state to generate
n−1
1 X 1 h i
√ |ii ⊗ p kxk|xi|0i + kwi k|wi i|1i |x · wi i. (18)
n i=0 kxk2 + kwi k2
A Quantum Model for Multilayer Perceptron 14

Step 4 Apply control rotation on the above quantum state and undo Step 2-3
n−1
1 X h p i
√ |ii ϕ(x · wi )|0i + 1 − ϕ(x · wi )2 |1i
n i=0
(19)
n−1 n−1
1 X 1 Xp
= √ ϕ(x · wi )|ii|0i + √ 1 − ϕ(x · wi )2 |ii|1i.
n i=0 n i=0

The explanation of the control operator in Step 2: First prepare the quantum state
|ii ⊗ √ 21 2
[kxk|0i + kwi k|1i]. Then conditionally preparing |xi and |wi i based on
kxk +kwi k
the control qubit |0i and |1i. Because of the control register |ii, these quantum states
can be prepared in parallel in Step 2. p
In Step 3, swap test returns a value Ti such that |Ti − x · wi / kxk2 + kwi k2 | ≤ ,
which lead to a good approximate of x · wi with a small relative error. The first part
of (19) is the desired quantum state |yi of y. Similar to the discussion in the quantum
simulation of a perceptron in section 4, we do not need to perform measurement to get
|yi exactly. Therefore, the final complexity of the four steps is
O((log mn)/). (20)
This achieves an exponential speedup over the classical method, whose complexity is
O(mn).
Remark 1. In the following, any quantum state similar to (19) will be abbreviated
written in the form √1n n−1 ⊥
P
i=0 ϕ(x · wi )|ii|0i + |0i . That is we only show the part
we are interested in. Here |0i⊥ contains two meanings: it means the hidden quantum
state is orthogonal to the first part; it also means there exists a qubit that can help
us distinguish the desired and undesired quantum states. Sometimes, there may exist
many qubits in |0i (such as |0, 0i), but we will still use |0i to denote it when the number
of qubits is not important.
The preparation of the quantum state of the vector z in the output layer is similar to
above procedure. In this case, it will show the advantage of performing no measurements
in (19). Denote the quantum state in (19) as |Y i, then the preparation of the quantum
state of z can be obtained in the following way:

Algorithm 3: Reading the output of the output layer in quantum computer


Step 1 Prepare √1p p−1
P
i=0 |ii.
Step 2 Apply control operator to prepare
p−1
1 X 1 h i
√ |ii ⊗ p |Y i|0i + kvi k|vi i|0i|1i .
p i=0 1 + kvi k2
Step 3 Apply parallel swap test on the above quantum state, which yields
p−1
1 X 1 h i
√ |ii ⊗ p |Y i|0i + kvi k|vi i|0i|1i |y · vi i. (21)
p i=0 1 + kvi k2
A Quantum Model for Multilayer Perceptron 15

Step 4 Apply control rotation on the above quantum state and undo Step 2-3
p−1
1 X
√ ϕ(y · vi )|ii|0i + |0i⊥ . (22)
p i=0

The whole procedure is the same as the preparation of the quantum information of
y. The only difference is Step 2, now it becomes |Y i an |vi i. Similarly, in Step 3, we
obtain a value Ri such that
1 y · vi
Ri − √ p ≤ 0 (23)
n 1 + kvi k2

in time O((log mnp)/0 ). So to make Ri n an good approximate of y · vi with small

relative error , we choose 0 such that 0 n = . Therefore, the final complexity of the
above procedure is

O((log mnp) n/2 ). (24)
Note that the classical algorithm to get the output has complexity O(n(m + p)). So
the quantum algorithm achieves an quadratic speedup at n and exponential speedup at
m, p. The quantum information of z lies in the first part of the quantum state (22).
Remark 2. In the above construction, if we change the sigmoid function into the general
form, then the complexity can be better. More precisely, in the preparation of the

quantum state of z, if we change the function sigmoid(y · vi ) into sigmoid(x · wi , 1/ n),
then the complexity to prepare the quantum state (22) can be simply written as
O((log mnp)/2 ). This really achieves an exponential speedup over the classical method.

5.3. Simulation in quantum computer: online learning algorithm


Since online learning is simple than batch learning, so we first discuss the simulation of
online learning (17) in a quantum computer. The updating rule (17) for vj needs the
information of ztj = ϕ(yt · vj ) and yt . Denote the quantum state in (19) by changing x
into xt as |Y t i, then the quantum information of yt is stored in |Y t i. The calculation of ztj
depends on swap test, which is similar to (21). Now we can describe the online learning
algorithm of vj as follows (the procedure is similar to the training of a perceptron):
Algorithm 4: Online training of the weights in the output layer in quantum
computer
Step 1 Randomly choose a sample (xt , rt ).
Step 2 Prepare |Y t i similar to (19). Then compute ztj = ϕ(yt · vjold ) by applying swap
test on √ 1 old 2 [|Y i|0i + kvjold k|vjold i|0i|1i]. If rtj ≈ ztj or ztj ≈ 0 or 1, then go to
1+kvj k
Step 1, else go to Step 3.
Step 3 Prepare the quantum state |vjnew i of vjnew by the following procedure
Step 3.1 Prepare the quantum state √1 (|Y t i|0i + |vjold i|1i).
2
A Quantum Model for Multilayer Perceptron 16

Step 3.2 Apply control operator on it to prepare


1 h q i
√ |Y t i|0i sη(rtj − ztj )ztj (1 − ztj )|0i + 1 − s2 η 2 [(rtj − ztj )ztj (1 − ztj )]2 |1i
2
 s 
old 2 old 2
1 skvj k s kvj k
+ √ |vjold i|1i  √ |0i + 1 − |1i ,
2 n n

where s = 1/ max{η|rtj − ztj |ztj (1 − ztj ), kwjold k} ≤ 1/ max{η/2, kvjold k/ n}.
Step 3.3 Apply Hadamard transformation on the second register to get
s

2 n
kvjnew k|vjnew i|0i + |0i⊥ .
Step 4 Go to Step 1 until converges.
The complexity of one step of iteration is the same as (24), except the factor log p.

Because of the appearance of s/2 n in the result of Step 3.3 and note that s is small,
the iteration of the next step will be much easier as follows: Denote the quantum state
in Step 3.3 as |Vjnew i.
Step 3.1’ Prepare the quantum state
sη(rtj − ztj )ztj (1 − ztj ) t
 
2 new
q |Y i|0i + |Vj i|1i ,
4 + [η(rtj − ztj )ztj (1 − ztj )]2 2

Step 3.2’ Apply Hadamard transformation on the second register, then we have

s/ n h  t t i
kvj k|vj i + η (rj − zj )zj (1 − zj ) ky k|y i |0i + |0i⊥ .
new new
 t t t t
q
t t t t 2
4 + [sη(rj − zj )zj (1 − zj )]
The main costs of the above learning algorithm of vj relies in the computation of
ztj , so after N steps of iteration, the complexity is

O(N 3/2 (log mn) n/2 ). (25)
The reason of N 3/2 is the same as (13). The classical online learning algorithm of vj has
complexity O(N mn). Compared to the classical algorithm, the quantum online learning
algorithm achieves an quadratic speedup at n and exponential speedup at m, however,
the dependence on the number of iterations is worse. If we apply the sigmoid function
given in remark 2, then the corresponding quantum online learning algorithm can also
achieve exponential speedup at n over the classical online learning algorithm.
The online learning rule (17) about wjnew needs the information of the summation
P t t t t t
i (ri − zi )zi (1 − zi )vij . At this time, we need to calculate zi in parallel. From (21),
where we change |Y i into |Y t i, we can get
p−1 q 
1 X
q
√ |ii ⊗ s̃(rti − zti )zti (1 − zti )vij |0i + 1 − s̃(rti − zti )zti (1 − zti )vij |1i , (26)
p i=0
where s̃ = 1/ maxi {(rti − zti )zti (1 − zti )vij }. In the above, we assume that (rti − zti )zti (1 −
zti )vij > 0, otherwise we can add another qubit to deal with the negative parts. In the
A Quantum Model for Multilayer Perceptron 17

first part, the probability of |0i equals s̃p−1 i (rti − zti )zti (1 − zti )vij . By swap test, this
P

summation i (rti − zti )zti (1 − zti )vij can be estimated in time


P

O((log mnp) np/2 ) (27)
to precision . Similar to the learning algorithm of vj , except now |Y t i is changed into
|xt i, the online learning algorithm of wj can be implemented in quantum computer in
time

O(N 3/2 (log mnp) np/2 ) (28)
where N is the number of iterations. The classical online learning algorithm about wj
has complexity O(n(m + p)). Therefore, the corresponding quantum learning algorithm
achieves an exponential at m and quadratic speedup at n and p. The exponential
speedup at n, p can be achieved by choosing suitable sigmoid function as discussed in
remark 2.

5.4. Simulation in quantum computer: batch learning algorithm


In this subsection, we focus on the modelling of batch learning algorithm in quantum
computer. The whole idea is the same as the data reading of y and z. Due to
the complicate expression of (15), (16), the corresponding quantum procedures are
complicate too. However, the procedure is very easy to understand.
First, we see how to achieve the updating rule (15) of vj in quantum computer. It
contains a summation over t = 0, 1, . . . , d − 1, which can be achieved by considering (19)
and (22) in parallel. The required quantum information in the updating rule include:
(1). Quantum state of yt = (ϕ(xt ·w0 ), . . . , ϕ(xt ·wn−1 )), which is contained in (19). The
quantum state in (19) will be denoted as |Y t i. (2). The inner product ztj = ϕ(yt · vj ),
which can be obtained by applying swap test between |Y t i and |vj i, see equation (21).
Now we can describe the basic idea of the updating rule of vj in a quantum computer:

Algorithm 5: Batch training of the weights in the output layer in quantum


computer
Step 1 Prepare √1d d−1
P
t=0 |ti.
Pd−1
Step 2 Prepare yt in parallel √1d t=0 |ti|Y t i, where |Y t i is the result obtained in (19)
by changing x into xt .
Step 3 Apply control operator to prepare
d−1
1 X 1 h i
√ |ti|Y t i ⊗ p |Y t i|0i + kvj k|vj i|0i|1i .
d t=0 1 + kvj k2
Step 4 Apply parallel swap test to obtain the information of ztj , that is to prepare
d−1
1 X 1 h i
√ |ti|Y t i ⊗ p |Y t i|0i + kvj k|vj i|0i|1i |yt · vj i.
d t=0 1 + kvj k2
A Quantum Model for Multilayer Perceptron 18

Step 5 Apply control rotation and undo the parallel swap test to prepare
d−1
1 X h q i
√ |ti|Y t i sη(rtj − ztj )ztj (1 − ztj )|0i + 1 − (sη(rtj − ztj )ztj (1 − ztj ))2 |1i
d t=0
where s = 1/ maxt η(rtj − ztj )ztj (1 − ztj ) ≈ 1/4η.
Step 6 Apply Hadamard transformation on the first register |ti
d−1
s X
√ η(rtj − ztj )ztj (1 − ztj )kyt k|y t i|0i + |0i⊥ .
d2 n t=0

Denote this state of |Vj i.


Step 7 Prepare
" √ #
1 d2 n
q kvjold k|vjold i|0i|+i + |Vj i|−i
d2 n/s2 + kvjold k2 s
(29)
1 ⊥
= q kvjnew k|vjnew i|0i + |0i .
d2 n/s2 + kvjold k2

Similar to the complexity analysis of (24), the whole complexity to get the quantum
state (29) is

O((log dmn) n/2 ). (30)
Next, we see the simulation of the updating rule (16) of wj in quantum computer.
This requires more information than updating vj : (1). All quantum states of xt , which
can be prepared in advance. (2). The summation i (rti −zti )zti (1−zti )vij . The underlying
P

idea is similar, however, the description now is a little complicate than above because
of the summation

Algorithm 6: Batch training of the weights in the hidden layer in quantum


computer
Step 1 Prepare √1d d−1
Pp−1
√1
P
t=0 |ti ⊗ p i=0 |ii.
Step 2 Apply control operator to prepare
d−1
1 X 1 h i
√ |ti|xt i ⊗ p kxt k|xt i|0i + kwj k|wj i|1i
d t=0 kxt k2 + kwi k2
p−1
1 X 1 h i
⊗√ |ii ⊗ p |Y t i|0i + kvi k|vi i|0i|1i .
p i=0 1 + kvi k
Step 3 Apply parallel swap test and undo it to prepare
d−1 p−1
1 X t t 1 X 1 h i
√ |ti|x i|yj i ⊗ √ |ii ⊗ p |Y t i|0i + kvi k|vi i|0i|1i |zti i (31)
d t=0 p i=0 1 + kvi k2
A Quantum Model for Multilayer Perceptron 19

Step 4 Apply control rotation to prepare


d−1 p−1 q
1 X t t 1 X
√ |ti|x i|yj i ⊗ √ |ii s̃(rti − zti )zti (1 − zti )vij |0i
d t=0 p i=0
q 
t t t t
+ 1 − s̃(ri − zi )zi (1 − zi )vij |1i ,

where s̃ = 1/ maxi (rti − zti )zti (1 − zti )vij ≈ 1/4 maxi vij and we assumed that
(rti − zti )zti (1 − zti )vij ≥ 0, otherwise we can give another copy to deal with the
negative parts. The probability of |0i equals p−1 s̃ p−1 t t t t
P
i=0 (ri − zi )zi (1 − zi )vij .
Step 5 Apply parallel swap test to prepare
d−1 p−1 q
1 X t t 1 X
√ |ti|x i|yj i ⊗ √ |ii s̃(rti − zti )zti (1 − zti )vij |0i
d t=0 p i=0
q  p−1 E
X
t t t t t t t t
+ 1 − s̃(ri − zi )zi (1 − zi )vij |1i ⊗ (ri − zi )zi (1 − zi )vij ,
i=0

Step 6 Undo Step 2-5 to prepare


d−1  X p−1
1 X t
√ |ti|x i ⊗ ŝ (rti − zti )zti (1 − zti )vij yjt (1 − yjt )kxt k|0i
d t=0 i=0
v
u
u  X p−1 2 
t t t t t t t
+ 1 − ŝ
t (ri − zi )zi (1 − zi )vij yj (1 − yj )kx k |1i ,
i=0

where ŝ = 1/ maxt { p−1 t t t t t t t t


P
i=0 (ri −zi )zi (1−zi )vij yj (1−yj )kx k} ≈ 1/4p maxt kx k maxi vij .
Step 7 Apply Hadamard transformation on the first register |ti, which yields
d−1 p−1
ŝ X X t
(r − zti )zti (1 − zti )vij yjt (1 − yjt )kxt k|xt i|0i + |0i⊥
d t=0 i=0 i
Denote this state of |Wj i.
Step 8 Prepare
 
1 d
q kwjold k|wjold i|0i|+i + |Wj i|−i
d2 /ŝ2 + kwjold k2 ŝ
(32)
1 ⊥
= q kwjnew k|wjnew i|0i + |0i .
d2 /ŝ2 + kwjold k2

Next, we give a complexity analysis of the above procedures: The first step
costs O(log dp). The costs in the second step is the preparation of |Y t i, which
equals O((log mn)/) by (20). The third step needs to apply swap test to estimate

zti . However, |Y t i contains a coefficient 1/ n in (31). So the costs in this step is
√ 2 √
O((log dmnp) n/ ). The fourth step costs O(1). Because of the factor 1/ p, the fifth
A Quantum Model for Multilayer Perceptron 20

step costs O((log dmnp) np/2 ). The sixth step costs O(1) and the seventh and eighth
step cost O(log d). Therefore, the total cost of the eight steps is

O((log dmnp) np/2 ). (33)
Further updating of vj and wj are similar as above two algorithms. The difference
is that in the following updating, we should apply the weight vectors (29) and (32)
instead of |vj i, |wj i anymore. Assume that the learning algorithm stops at N steps of
iteration, then the complexities of updating vj and wj are
√ √
O((log dmn) n/N ) and O((log dmnp) np/N ) (34)
respectively. Similarly analysis as remark 2, the complexity (34) can be improved into
O((log dmn)/N ) and O((log dmnp)/N ) by choosing suitable sigmoid functions in each
layer. If the number N of iterations is not large, then this achieves an exponential
speedup over the classical birch learning algorithm, whose complexity is O(N dmn) and
O(N dn(m + p)) respectively.
The complexity of batch learning algorithm (34) is exponential at the number N
of iterations. Just like online learning, we can compute all the desired coefficients
in advance. This can reduce the dependence of N into polynomial, however, the
dependence on d, m, n, p will be worse than (34).

6. Learning algorithm of Hopfield neural network in a quantum computer

Hopfield neural network (HNN) is one model of recurrent neural network [15] with
Hebb’s learning rule [14]. HNN serve as associative memory systems with binary
threshold neurons. A variant form of HNN with continuous input range was proposed
in [16]. The method introduced in this paper cannot directly applied to learn HNN in
quantum computer. However, an amendatory version still exists. In this section, we will
not focus too much on the details of learning algorithm of HNN in quantum computer.
Also the reader can find more introductions about HNN in [13, 24, Chapter 13]. A brief
introduction about HNN can be found in [29]. The idea about quantum HNN model
applied here is different from [23], where they study HNN from density matrix and some
related techniques (such as quantum principal component analysis and HHL algorithm).
Assume that X = (xij )P ×N is a P ×N matrices whose rows store the firing patterns
of HNN. Denote the i-th row of X as ui , the j-th column of X as vj . The weight matrix
W = (wij )N ×N of HNN satisfies wij = wji and wii = 0. Denote the j-th column of W
as wj . The Hebb’s learning rule of HNN is defined as
1
wij = vi · vj . (35)
P
And the update rule of X is defined by
xij = ϕ(ui · wj ), (36)
where ϕ is the threshold function (10) if all xij take discrete values in {−1, 1} or the
sigmoid function (14) if all xij have continuous ranges.
A Quantum Model for Multilayer Perceptron 21

Since the input of HNN is the matrix X, it is better to consider the following
quantum state
P −1 N −1 P −1
1 XX 1 X
|Xi = xij |i, ji = kui k|i, ui i
kXkF i=0 j=0 kXkF i=0
N −1 (37)
1 X
= kvj k|vj , ji,
kXkF j=0
qP
where kXkF = i,j |xij |2 is the Frobenius norm of X. Similarly, one can define |W i.
We consider the problem to get the quantum state |W i by only giving |Xi. Consider
the quantum state
N −1 N −1
1 X 1 X
|i, ji|+i|Xi = kvk k|i, ji|+i|vk , ki. (38)
N i,j=0 N kXkF i,j,k=0
For any i, j, define
( (
(0) 1, if i = j; (1) 1, if i = j;
δi,j = δi,j =
0, if i 6= j. 2, if i 6= j.
(0)
As for |i, ji|0i|vk , ki, we change it into |i, ji|0i|vk , ki|δi,k i. When i = k, we produce a
copy of |ji to get |i, ji|0i|vi , i, ji|1i. Similarly, as for |i, ji|1i|vk , ki, we change it into
(1)
|i, ji|1i|vk , ki|δj,k i. And when j = k, we produce a copy of |ii to get |i, ji|1i|vj , i, ji|1i.
Perform these operations on (38), we obtain
N −1
1 X h i
√ |i, ji |0i(kvi k|vi , i, ji|1i + |φ0 i|0i) + |1i(kvj k|vj , i, ji|1i + |φ2 i|2i) , (39)
2N kXkF i,j=0
where |φ0 i, |φ2 i are some undesired quantum states. Note that the inner product of
kvi k|vi , i, ji|1i + |φ0 i|0i and kvj k|vj , i, ji|1i + |φ2 i|2i equals kvi kkvj khvi |vj i = P wij . By
proposition 2 and undo the above procedure, we will obtain
N −1 i P skW k
P X h q
F
2 2
|i, ji swij |0i + 1 − s wij |1i = |W i|0i + |0i⊥ . (40)
N kXkF i,j=0 N kXkF
where s = 1/ maxi,j |wij |. The whole costs to get (40) is O((log(P N ))/). The update
rule (36) can be considered similarly by considering √12 (|0i|Xi + |1i|W i) instead of
|+i|Xi in (38). When the number of iterations is not large, this algorithm achieves an
exponential speedup over the classical Hebb’s learning rule of HNN. The above idea has
been applied in [32] to study matrix multiplication by parallel swap test, which achieves
much better performance than SVE [21] or HHL algorithm [12]. Here, we also see that a
similar idea also plays an important role in the learning of HNN in quantum computer.

7. Conclusions

In this paper, we establish a model of multilayer perceptron and a learning algorithm


of Hopfield network in quantum computer based on parallel swap test technique. The
A Quantum Model for Multilayer Perceptron 22

performance is much better than the classical algorithms when the number of layers
or the number of iterations in not large. On one hand, as for the parallel swap test
technique, it play an important role in the design of the quantum model of multilayer
perceptron and Hopfield network. So as a research problem, it deserves to find more
applications of parallel swap test. Actually, we already found one such application
in the Tikhonov regularization problem, which is used to deal with ill-posed inverse
problem. On the other hand, with this quantum model of multilayer perceptron and
Hopfield network, it remains a problem to find its applications in quantum machine
learning. However, the learning algorithm in multilayer perceptron is based on gradient
descent algorithm, which contains a low convergence rate. So it may deserve to consider
the learning algorithm based on Newton’s method or quasi-Newton’s method, where
quantum computer can also achieve certain speedups under certain conditions. Finally,
it reserves as a problem to generalize this model to study the multilayer perceptron with
more than two layers.

Acknowledgments

This work is supported by the NSFC Project 11671388 and the CAS Frontier Key
Project QYZDJ-SSW-SYS022.

References

[1] Alpaydin E 2015 Introduction to Machine Learning (3rd edition, MIT press)
[2] Altaisky M V 2001 Quantum neural network arXiv:quant-ph/0107012v2
[3] Andrecut M, Ali M K 2002 Int. J. Mod. Phys. C 13 75
[4] Behrman E C, Nash L, Steck J E, Chandrashekar V, Skinner S R 2002 Inf. Sci. 128 257
[5] Berry D W, Childs A M, Cleve R, Kothari R, Somma R D 2015 Phys. Rev. Lett. 114 090502
[6] Bishop C M 1996 Neural Networks for Pattern Recognition (1st edition, Clarendon Press, Oxford)
[7] Bourlard H, Kamp Y 1988 Biological Cybernetics 59 291
[8] Buhrman H, Cleve R, Watrous J, Wolf R de 2001 Phys. Rev. Lett. 87 167902
[9] Childs A M, Kothari R, Somma R D 2017 SIAM J. Comput. 46 1920
[10] Clader B D, Jacobs B C, Sprouse C R 2013 Phys. Rev. Lett. 110 250504
[11] Fukunaga K 1990 Statistical Pattern Recognition (2nd edition, Academic Press, New York)
[12] Harrow A W, Hassidim A, Lloyd S 2009 Physical review letters 103 150502
[13] Haykin S 2009 Neural Networks and Learning Machines (3rd edition, Pearson)
[14] Hebb D O 2002 The organization of behavior: A neuropsychological theory (Lawrence Erlbaum)
[15] Hopfield J J 1982 Proc. Nat. Acad. Sci. 79 2554
[16] Hopfield J J 1984 Proc. Nat. Acad. Sci. 81 3088
[17] Hornik K, Stinchcombe M, White H 1989 Neural Networks 2 359
[18] Hui C-L 2011 Artificial Neural Networks-Application (InTech)
[19] Jain A K, Mao J C, Mohiuddin K M 1996 Computer 29 31
[20] Kak C S 1995 Adv. Imaging Electron Phys. 94 259
[21] Kerenidis I, Prakash A 2017 ITCS 49 1
[22] McCulloch W S, Pitts W 1943 Bull. Math. Biol. 5 115
[23] Rebentrost P, Bromley T R, Weedbrook C, Lloyd S 2017 A Quantum Hopfield Neural Network
arXiv:1710.03599v2
[24] Rojas R 1996 Neural Networks: A Systematic Introduction (Springer)
A Quantum Model for Multilayer Perceptron 23

[25] Ricks B, Ventura D 2003 Advances in Neural Information Processing Systems: Proceedings of the
2003 Conference vol 16 (A Bradford Book) p 1
[26] Rosenblatt F 1958 Psychological Review 65 386
[27] Rumelhart D E, Hinton G E, Williams R J 1986 Nature 323 533
[28] Schiffman W H, Geffers H W 1993 Neural Networks 6 517
[29] Schuld M, Sinayskiy I, Petruccione F 2014 Quantum Inf. Process 13 2567
[30] Schuld M, Sinayskiy I, Petruccione F 2015 Phys. Lett. A 379 660
[31] Schuld M, Petruccione F 2018 Supervised Learning with Quantum Computers (Springer)
[32] Shao C P 2018 Quantum Algorithms to Matrix Multiplication arXiv:1803.01601v2
[33] Shao C P 2018 From linear combination of quantum states to Grover’s searching algorithm
arXiv:1807.09693v2
[34] Silva A J da, Ludermir T B, Oliveira W R de 2016 Neural Networks 76 55
[35] Siomau M 2014 Quantum Inf. Processing 13 1211
[36] Suykens J A K, Vandewalle J P L, DeMoor B L R 1996 Artificial Neural Networks for Modeling
and Control of Non-Linear Systems (Kluwer, Dordrecht)
[37] Wan K H, Dahlsten O, Kristjánsson H, Gardner R, Kim M S 2017 npj Quantum Inf. 3 36

You might also like