A Quantum Model For Multilayer Perceptron
A Quantum Model For Multilayer Perceptron
Changpeng Shao
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing
arXiv:1808.10561v2 [quant-ph] 8 Sep 2018
100190, China
E-mail: [email protected]
1. Introduction
Inspired by biological neural networks, artificial neural networks (ANNs) are massively
parallel computing models consisting of an extremely large number of simple processors
with many interconnections. ANNs are highly successful methods to study machine
learning. Researchers from different scientific disciplines are designing different ANN
models to solve various problems in pattern recognition, clustering, approximation,
prediction, optimization, control and so on [6, 11, 17, 18, 19, 28, 36]. Hence ANNs
are of great interest for quantum adaptation. Quantum neural networks (QNNs) are
models combining the powerful features of quantum computing (such as superposition
and entanglement) and ANNs (such as parallel computing). However, the nonlinear
dynamics of ANNs are very different from unitary operations (which are linear)
of quantum computing. To find a meaningful QNN model that integrates both
fields is a highly nontrivial task [29]. Many attempts of QNN models are obtained
[2, 3, 4, 20, 25, 29, 30, 34, 35, 37].
In [2], Altaisky introduced a quantum perceptron which is modelled by the following
P
quantum updating function |y(t)i = F̂ i ŵiy (t)|xi i, where F̂ is an arbitrary unitary
A Quantum Model for Multilayer Perceptron 2
operator and ŵiy (t) is an operator representing the weights. The training rule is
ŵiy (t + 1) = ŵiy (t) + η(|di − |y(t)i)hxi |, where |di in the quantum state of the target
vector, |y(t)i is the output state of the quantum perceptron and η is some pramteter.
The author notes that this training rule is by no means unitary regarding the components
of the weight matrix. So it would fail to preserve the unitary property and so the total
probability of the system.
A large class of ANNs uses binary McCulloch-Pitts neurons [22] in which the neural
cells are assumed to be active or resting. The value of inputs and outputs are {−1, 1}.
So there is a natural connection between the the inputs {−1, 1} and qubits |0i, |1i.
All possible inputs x1 , . . . , xn ∈ {−1, 1} naturally corresponds to a quantum state
P
x1 ,...,xn ∈{−1,1} αx1 ,...,xn |x1 , . . . , xn i, where αx1 ,...,xn refers to the amplitude of |x1 , . . . , xn i.
In [30], Schuld et al. considered the simulation of a perceptron in quantum computer in
this model. They put the signals x1 , . . . , xn ∈ {−1, 1} into qubits |x1 , . . . , xn i, so that the
output of a perceptron under the threshold function can be computed to some precision
by applying quantum phase estimation on a unitary operator, which is generated by
the weights. This model uses n qubits, so the complexity of the method given in [30] is
linear at n. For the introduction of other QNN models, we refer to [29].
In this paper, we give a try to study multilayer perceptron (MLP) in quantum
computer from a different idea, that is setting xi as the amplitude of qubits to generate
1
P
the quantum state |xi = kxk xi |ii, where x = (x1 , . . . , xn ) is the vector of input
signals. This idea has been applied in [23] to study quantum Hopfield neural network.
One apparent advantage is that it only uses O(log n) qubits. So it may have a better
performance than the model used in [30, 31]. Also in the quantum state |xi, the
input signal xi can be any value. So it contains a more general form than the model
considered in [30]. Similarly, we can generate the quantum state |wi of the weight vector
P
w = (w1 , . . . , wn ). The output of MLP can be any nonlinear function of x·w := i xi wi .
However, computing x · w (or hx|wi when we only interested in the sign) is not difficult
in a quantum computer due to swap test [8]. Moreover, any function of this value can
be obtained by adding an ancilla qubit, just like HHL algorithm did [12]. This shows
another advantage of the representation |xi of the input signal vector x.
However, reading the output is just the initial step of studying quantum MLP. A
big challenge of the quantum MLP is the learning algorithm because of the nonlinear
structure of MLP. Learning (or self-learning) ability is one of the most significant features
of MLP. Many applications of MLP can be achieved by suitably constructing the network
first. Then let MLP to achieve the desired results from learning. So establish the learning
algorithm of MLP in quantum computer is necessary and important. In this paper,
we will provide a method to integrate the nonlinear structure of MLP into quantum
computer. At the same time, the good features of MLP (e.g., parallelism) and quantum
computer (e.g., superposition) can be satisfied simultaneously. This provide us a new
model to study MLP and its applications in quantum machine learning.
The integration is achieved by a new technique called parallel swap test proposed in
a previous paper [32] to study matrix multiplication. It is a generalization of swap test
A Quantum Model for Multilayer Perceptron 3
[8], but in a parallel form. Simply speaking, parallel swap test can output the quantum
information in parallel, which is required in the construction of quantum MLP. With
this technique, we will show that the computation of the output signals and the learning
algorithm (online and batch) of weights in quantum MLP can be achieved much faster
(at least quadratic or even exponential faster) than the classical versions. Finally, as an
inspiration, we will study the learning (Hebb’s learning rule) of Hopfield network. This
also achieves an exponential speedup at the number of input neurons over the classical
learning algorithm.
The structure of this paper is as follows: In section 2, we extend the swap test
technique into a parallel form, which will play a central role in the simulation of
quantum MLP. In section 3, we propose a technique to prepare quantum states, which is
exponential fast than previous quantum algorithms. Although this cannot prepare the
quantum states of all vectors efficiently, one can believe that it can solve many practical
problems as fast as desired. In section 4, we focus on the simulation of a perceptron.
The simulation, which include the reading of output data and the back-propagation
learning algorithm, of MLP in quantum computer will be studied in section 5. Finally,
in section 6, we study the learning algorithm of Hopfield neural network in a quantum
computer.
Notation. We will use bold letters x, y, z, . . . to denote vectors and italic letters
|xi, |yi, |zi, . . . to denote their quantum states. The norm k · k always refers to the
2-norm of vectors.
Swap test was first proposed in [8] as an application of quantum phase estimation
algorithm and Grover searching, which can be used to estimate the probabilities or
amplitudes of certain desired quantum states. It plays an important role in many
quantum machine learning algorithms to estimate the inner product of two quantum
states. In the following, we first briefly review the underlying problem swap test
considers and the basic procedures to solve it. Then, we review the generalized form of
swap test considered in a previous work [32] to study matrix multiplication in quantum
computer. This will play a central role in our construction of the model of multilayer
perceptron in a quantum computer.
Let
|φi = sin θ|0i|ui + cos θ|1i|vi (1)
be a unknown quantum state that can be prepared in time O(Tin ), where |ui, |vi are
normalized quantum states and θ is a unknown angle parameter. The problem is how
to estimate θ in quantum computer to precision with a high success probability.
Suppose that |φi comes from some other quantum algorithms, which means there is
a given unitary U , which can be implemented in time O(Tin ), such that |φi = U |0i. Let Z
be the 2-dimensional unitary transformation that maps |0i to −|0i and |1i to |1i, which is
A Quantum Model for Multilayer Perceptron 4
instead of outputting. Moreover, from (3) and (4), we actually can obtain the following
quantum state
1 E
√ |0i|xi + |1i|yi f (hx|yi) ,
2
for any function f in that cosine function is even. The case for complex quantum states
can be solved similarly by considering hx|yi and hx|i|yi in parallel. Concluding the
above analysis, we have
Proposition 2. Let |xi, |yi be two quantum states, which can be prepared in time O(Tin ).
Let f be any function. Then there is a quantum algorithm runs in time O(Tin /) to
achieve
1 1
√ (|0i|xi + |1i|yi) 7→ √ (|0i|xi + |1i|yi)|f (s)i, (5)
2 2
where |hx|yi − s| ≤ .
From proposition 2, it is easy to get the following results
Theorem 1 (Parallel Swap Test). Given 2N quantum states |u0 i, |v0 i, . . . , |uN −1 i, |vN −1 i
which can be prepared in time O(Tin ) and N functions f0 , . . . , fN −1 . Then there is a
quantum algorithm with runtime O(Tin /) to get the following quantum state
N −1
1 X
√ |ji|fj (sj )i, (6)
N j=0
where |sj − huj |vj i| ≤ .
Proof. The result (6) can be obtained by operating (5) in parallel because of the control
P −1
qubit. More precisely, construct the quantum state √1N N j=0 |ji first. Then view |ji
1
as a control qubit to prepare |φj i := √2 (|uj i|0i + |vj i|1i). So we can efficiently get a
P −1
quantum state in the form √1N N j=0 |ji|φj i. For each |φj i, perform the transformation
P −1
(5) to get the information of fj (huj |vj i). So we have √1N N j=0 |ji|φj i|fj (huj |vj i)i. As
discussed above, this is achieved by performing quantum phase estimation on a unitary
P −1
operator Gj , which depends on |φj i. So the quantum state √1N N |ji|φj i|fj (huj |vj i)i
P j=0
is obtained by performing the quantum phase estimation on j |jihj| ⊗ Gj . Finally, we
will get the desired state (6) by undoing the preparation of |φj i.
P
Corollary 1. For any given quantum state j αj |ji, which can be prepared in time
P
O(Tin ) and any function f , we can obtain j αj |ji|f (α̃j )i in time O(Tin /), where
|αj − α̃j | ≤ .
P
Proof. Denote |φi = j αj |ji as the given quantum state, then αj = hj|φi. The desired
result can be obtained in a similar way as the proof of Theorem 1.
A Quantum Model for Multilayer Perceptron 6
For further application in the quantum model of multilayer perceptron, we discuss one
method in this section to achieve quantum state preparation. Let x = (x0 , . . . , xm−1 )
1
Pm−1
be any complex vector, then its quantum state |xi = kxk j=0 xj |ji can be prepared
by the following simple procedure (the idea comes from Clader et al. [10]):
Step 1, prepare √1m m−1
P
j=0 |ji.
Step 2, apply control operator to get the component of x, that is to prepare
m−1
1 X
q
√ 2 2
|ji txj |0i + 1 − t |xj | |1i , (7)
m j=0
where t = 1/ maxj |xj |. The first part contains the information of |xi, and |0i serves
√
as a distinguish qubit. The complexity to get |xi is O( n maxj |xj |(log m)/kxk) =
O((log n) maxj |xj |/ minj |xj |) when all components of x are nonzero. We can overcome
the case when minj |xj | = 0 by only considering the nonzero components of x. However,
before performing any measurement, the complexity to get (7) is O(log m). Sometimes
the quantum state (7) that contains the information of |xi and the norm kxk is enough
to solve certain problems.
A more efficient quantum algorithm is based on the linear combinations of unitaries
technique (LCU for short), which was first proposed in [32] (also see [33] for more
details). The LCU problem can be stated as: given m complex numbers αj and
m quantum states |vj i, which can be prepared efficiently in time O(Tin ), where
j = 0, 1, . . . , m − 1. Then how to prepare the quantum state |yi proportional to
Pm−1
y = j=0 αj |vj i? And what is the corresponding complexity? The following LCU
idea was used to simulate Hamiltonian [5] and solve linear systems [9].
Pm−1
Set αj = rj eiθj , where rj > 0 is the norm of αj . Denote s = j=0 rj . Define
1
Pm−1 √
the unitary operator S as S|0i = √s j=0 rj |ji. Then |yi can be obtained from
√
the following procedure: Prepare the initial state √1s m−1
P
j=0 rj |ji|0i by S. Then
√ iθj
conditionally to prepare |vj i according to the qubit |ji, so we get √s m−1
1
P
j=0 rj e |ji|vj i.
m−1
Finally, apply S † on |ji, which yields s |0i j=0 αj |vj i + orthogonal parts. It is easy to
1
P
see that the complexity to obtain |yi equals O((Tin + log m)s/kyk). A direct corollary
of this LCU is
Proposition 3. For any vector x = (x0 , . . . , xm−1 ), its quantum state can be prepared
in time O(κ(x) log m), where κ(x) = maxk |xk |/ mink,xk 6=0 |xk |.
Proof. Assume that all entries of x are nonzero, otherwise it suffices to focus on the
nonzero entries of x. Then to prepare |xi, one just need to choose |vj i = |ji. At this
P
time O(Tin ) = O(1). So the complexity is O((Tin + log m) j |αj |/kyk) = O(κ(x) log m)
since kyk ≥ m minj |xj | and s ≤ m maxj |xj |.
The above result is the same as the result given by Clader et al. [10]. Actually,
based on the LCU given above, the quantum state can be prepared more efficiently.
A Quantum Model for Multilayer Perceptron 7
Proof. For simplicity, we assume that |x0 | = mink,xk 6=0 |xk |. Find the minimal q such
that κ(x) ≤ 2q , so q ≈ log κ(x). For any 1 ≤ j ≤ q, there are several entries of x
such that their absolute values lie in the interval [2j−1 |x0 |, 2j |x0 |). Define yj as the m
dimensional vector by filling these entries into the corresponding positions as them in
x and zero into other positions. Then x = y1 + · · · + yq . For any j, we have κ(yj ) ≤ 2,
so the quantum state |yj i of vector yj can be prepared efficiently in time O(log m) by
proposition 3. We also have |xi = λ1 |y1 i + · · · + λq |yq i, where λj = kyj k/kxk. From
the LCU method given above, the complexity to achieve such p a linear combination to
√
get |xi equals O((log m) qj=1 kyj k/kxk) = O( q log m) = O( log κ(x) log m), where
P
Compared with LCU, parallel swap test achieves a similar task except that the
coefficients αj are not given directly. One main idea about parallel swap test we will
frequently use in the following simulation of multilayer perceptron in quantum computer
is that: given 2l quantum states |vj± i, then we can compute the inner product hvj+ |vj− i
in parallel. More precisely, prepare the following quantum state first
l−1
1 X 1
√ |ji ⊗ √ (|vj+ i|0i + |vj− i|1i). (8)
l j=0 2
For each √12 (|vj+ i|0i + |vj− i|1i), there is a unitary operator Gj whose eigenvalues contain
the information about hvj+ |vj− i. So apply quantum phase estimation on j |jihj| ⊗ Gj
P
Figure 2. Linear separable training samples. There is a hyperplane to divide the two
classes.
A Quantum Model for Multilayer Perceptron 9
Rosenblatt also defines a learning algorithm (or updating rule) to adjust the weight
of the perceptron when it makes wrong decisions. More precisely, given a set of training
samples {(xt , rt ) : t = 0, 1, . . . , d − 1}, where rt is the desired output of xt , that is
(
1, if x belongs to the first class;
rt = (11)
−1, if x belongs to the second class.
Then the updating rule is defined as
wnew = wold + η(rt − y t )xt (12)
by randomly choosing a sample (xt , rt ), where η ∈ [0, 1] is called the learning factor and
y t = ϕ(x · wold ).
Rosenblatt proved that this learning rule converges after a finite number iterations,
which is currently known as perceptron convergence theorem. Simply, from the updating
rule (12), if rt = y t , then there is no updating. If rt > y t , that is rt = 1, y t = −1, then
wnew = wold + 2ηxt . Since rt = 1, we know that xt belongs to the first class. However,
y t = −1 implies that we make the wrong decision to put xt into the second class. So we
should increase the value of y t . From the updated weight wnew , the new output of xt
equals ϕ(xt · wold + 2ηxt · xt ), which is larger than the old output ϕ(xt · wold ). Similar
analysis also holds if rt < y t . More details about perceptron convergence theorem can
be found in [13, Chapter 1]. Such a learning algorithm is also called online learning [1]
in that the training only requires one sample at each step. There are a lot of advantages
of online learning, such as it saves the cost to store the training sample, it is easy to
implement and so on [1, Chapter 11], [13, Chapter 4].
(1). Prepare the quantum state in Step 3: At the j-th step of iteration, the
complexity is O(j(log m)), since we perform no measurements.
(2). Estimating the value of y t by swap test: At the j-th step of iteration, the
coefficient is λj . So to make the error of estimating y t is small in size , the error chosen
in swap test is /λj . Then the complexity to estimate y t is O(j 3/2 η(log m)/) since the
quantum state preparation costs O(j(log m)) as discussed in (1).
Therefore, the final complexity of the learning algorithm of a perceptron in a
quantum computer equals
O(n3/2 (log m)η/ + n(log m)). (13)
In a classical computer, estimating the inner product of x and w costs O(m). So the
complexity of the learning algorithm of a perceptron after n steps of iteration in a
classical computer is O(mn). Compared to classical learning algorithm, the quantum
algorithm achieves an exponential speedup at m. As a compensation, its dependence
on n is worse than classical algorithm.
The most commonly used activation function in multilayer perceptron is the sigmoid
A Quantum Model for Multilayer Perceptron 12
the output layer. The error (called error energy averaged over the training sample, or
the empirical risk) of this network is defined by E = 21 i,t (rti − zti )2 , where zt is the
P
vector generated by zti = sigmoid(yt · vi ) for all i and yt is the vector generated by
yjt = sigmoid(xt · wj ) for all j. The back-propagation learning algorithm is obtained by
gradient descent method [1, 13] as follows: In the gradient descent method, we need to
compute the gradient of the error function with respect to wi and vj . First, we see the
computation of the gradient of E with respect to vj . Direct calculation shows that
∂E ∂ztj
= − t (rtj − ztj ) = − t (rtj − ztj )ztj (1 − ztj )ylt .
P P
∂vjl ∂vjl
So the updating rule of vj is
P
vjnew = vjold + η t (rtj − ztj )ztj (1 − ztj ) yt . (15)
As for the gradient of E with respect to wi , by chain rule, we have
∂E ∂zt ∂yjt
= − i,t (rti − zti ) it = − i,t (rti − zti )zti (1 − zti )vij yjt (1 − yjt )xtk .
P P
∂wjk ∂yj ∂wjk
A Quantum Model for Multilayer Perceptron 13
are
vjnew = vjold + η (rtj − ztj )ztj (1 − ztj ) yt ,
P t t (17)
wjnew = wjold + η t t t t t
i (ri − zi )zi (1 − zi )vij yj (1 − yj ) x .
5.2. Simulation in quantum computer: reading the output data of each layer
Different from a preceptron, the outputs of each layer and the learning algorithms of a
multilayer perceptron depict strong nonlinear features. While quantum computer only
allows unitary operations, which are linear operations. This is one reason why it is so
hard to integrate the good features of multilayer perceptron and quantum computing
to generate a useful model in quantum computer. However, we will show in this and
the next subsection that it is not so “difficult” as it looks to integrate the nonlinear
structure of multilayer perceptrons into a quantum computer.
In this subsection, we concentrate on the outputs reading of each layer. First, we
show how to get the output vector y of the first layer. Note that its i-th component
equals yi = sigmoid(x·wi ), where i = 0, 1, . . . , n−1. If we compute all the components by
swap test, then the complexity is at least O(n). In the following, we give another method
to construct the quantum data of y. We will use ϕ to denote the sigmoid function for
simplicity in this subsection, although it has been used to denote the threshold function
in Rosenblatt’s perceptron. The procedure of preparing the quantum state of y can be
stated as follows:
Step 4 Apply control rotation on the above quantum state and undo Step 2-3
n−1
1 X h p i
√ |ii ϕ(x · wi )|0i + 1 − ϕ(x · wi )2 |1i
n i=0
(19)
n−1 n−1
1 X 1 Xp
= √ ϕ(x · wi )|ii|0i + √ 1 − ϕ(x · wi )2 |ii|1i.
n i=0 n i=0
The explanation of the control operator in Step 2: First prepare the quantum state
|ii ⊗ √ 21 2
[kxk|0i + kwi k|1i]. Then conditionally preparing |xi and |wi i based on
kxk +kwi k
the control qubit |0i and |1i. Because of the control register |ii, these quantum states
can be prepared in parallel in Step 2. p
In Step 3, swap test returns a value Ti such that |Ti − x · wi / kxk2 + kwi k2 | ≤ ,
which lead to a good approximate of x · wi with a small relative error. The first part
of (19) is the desired quantum state |yi of y. Similar to the discussion in the quantum
simulation of a perceptron in section 4, we do not need to perform measurement to get
|yi exactly. Therefore, the final complexity of the four steps is
O((log mn)/). (20)
This achieves an exponential speedup over the classical method, whose complexity is
O(mn).
Remark 1. In the following, any quantum state similar to (19) will be abbreviated
written in the form √1n n−1 ⊥
P
i=0 ϕ(x · wi )|ii|0i + |0i . That is we only show the part
we are interested in. Here |0i⊥ contains two meanings: it means the hidden quantum
state is orthogonal to the first part; it also means there exists a qubit that can help
us distinguish the desired and undesired quantum states. Sometimes, there may exist
many qubits in |0i (such as |0, 0i), but we will still use |0i to denote it when the number
of qubits is not important.
The preparation of the quantum state of the vector z in the output layer is similar to
above procedure. In this case, it will show the advantage of performing no measurements
in (19). Denote the quantum state in (19) as |Y i, then the preparation of the quantum
state of z can be obtained in the following way:
Step 4 Apply control rotation on the above quantum state and undo Step 2-3
p−1
1 X
√ ϕ(y · vi )|ii|0i + |0i⊥ . (22)
p i=0
The whole procedure is the same as the preparation of the quantum information of
y. The only difference is Step 2, now it becomes |Y i an |vi i. Similarly, in Step 3, we
obtain a value Ri such that
1 y · vi
Ri − √ p ≤ 0 (23)
n 1 + kvi k2
√
in time O((log mnp)/0 ). So to make Ri n an good approximate of y · vi with small
√
relative error , we choose 0 such that 0 n = . Therefore, the final complexity of the
above procedure is
√
O((log mnp) n/2 ). (24)
Note that the classical algorithm to get the output has complexity O(n(m + p)). So
the quantum algorithm achieves an quadratic speedup at n and exponential speedup at
m, p. The quantum information of z lies in the first part of the quantum state (22).
Remark 2. In the above construction, if we change the sigmoid function into the general
form, then the complexity can be better. More precisely, in the preparation of the
√
quantum state of z, if we change the function sigmoid(y · vi ) into sigmoid(x · wi , 1/ n),
then the complexity to prepare the quantum state (22) can be simply written as
O((log mnp)/2 ). This really achieves an exponential speedup over the classical method.
Step 3.2’ Apply Hadamard transformation on the second register, then we have
√
s/ n h t t i
kvj k|vj i + η (rj − zj )zj (1 − zj ) ky k|y i |0i + |0i⊥ .
new new
t t t t
q
t t t t 2
4 + [sη(rj − zj )zj (1 − zj )]
The main costs of the above learning algorithm of vj relies in the computation of
ztj , so after N steps of iteration, the complexity is
√
O(N 3/2 (log mn) n/2 ). (25)
The reason of N 3/2 is the same as (13). The classical online learning algorithm of vj has
complexity O(N mn). Compared to the classical algorithm, the quantum online learning
algorithm achieves an quadratic speedup at n and exponential speedup at m, however,
the dependence on the number of iterations is worse. If we apply the sigmoid function
given in remark 2, then the corresponding quantum online learning algorithm can also
achieve exponential speedup at n over the classical online learning algorithm.
The online learning rule (17) about wjnew needs the information of the summation
P t t t t t
i (ri − zi )zi (1 − zi )vij . At this time, we need to calculate zi in parallel. From (21),
where we change |Y i into |Y t i, we can get
p−1 q
1 X
q
√ |ii ⊗ s̃(rti − zti )zti (1 − zti )vij |0i + 1 − s̃(rti − zti )zti (1 − zti )vij |1i , (26)
p i=0
where s̃ = 1/ maxi {(rti − zti )zti (1 − zti )vij }. In the above, we assume that (rti − zti )zti (1 −
zti )vij > 0, otherwise we can add another qubit to deal with the negative parts. In the
A Quantum Model for Multilayer Perceptron 17
first part, the probability of |0i equals s̃p−1 i (rti − zti )zti (1 − zti )vij . By swap test, this
P
Step 5 Apply control rotation and undo the parallel swap test to prepare
d−1
1 X h q i
√ |ti|Y t i sη(rtj − ztj )ztj (1 − ztj )|0i + 1 − (sη(rtj − ztj )ztj (1 − ztj ))2 |1i
d t=0
where s = 1/ maxt η(rtj − ztj )ztj (1 − ztj ) ≈ 1/4η.
Step 6 Apply Hadamard transformation on the first register |ti
d−1
s X
√ η(rtj − ztj )ztj (1 − ztj )kyt k|y t i|0i + |0i⊥ .
d2 n t=0
Similar to the complexity analysis of (24), the whole complexity to get the quantum
state (29) is
√
O((log dmn) n/2 ). (30)
Next, we see the simulation of the updating rule (16) of wj in quantum computer.
This requires more information than updating vj : (1). All quantum states of xt , which
can be prepared in advance. (2). The summation i (rti −zti )zti (1−zti )vij . The underlying
P
idea is similar, however, the description now is a little complicate than above because
of the summation
where s̃ = 1/ maxi (rti − zti )zti (1 − zti )vij ≈ 1/4 maxi vij and we assumed that
(rti − zti )zti (1 − zti )vij ≥ 0, otherwise we can give another copy to deal with the
negative parts. The probability of |0i equals p−1 s̃ p−1 t t t t
P
i=0 (ri − zi )zi (1 − zi )vij .
Step 5 Apply parallel swap test to prepare
d−1 p−1 q
1 X t t 1 X
√ |ti|x i|yj i ⊗ √ |ii s̃(rti − zti )zti (1 − zti )vij |0i
d t=0 p i=0
q p−1 E
X
t t t t t t t t
+ 1 − s̃(ri − zi )zi (1 − zi )vij |1i ⊗ (ri − zi )zi (1 − zi )vij ,
i=0
Next, we give a complexity analysis of the above procedures: The first step
costs O(log dp). The costs in the second step is the preparation of |Y t i, which
equals O((log mn)/) by (20). The third step needs to apply swap test to estimate
√
zti . However, |Y t i contains a coefficient 1/ n in (31). So the costs in this step is
√ 2 √
O((log dmnp) n/ ). The fourth step costs O(1). Because of the factor 1/ p, the fifth
A Quantum Model for Multilayer Perceptron 20
√
step costs O((log dmnp) np/2 ). The sixth step costs O(1) and the seventh and eighth
step cost O(log d). Therefore, the total cost of the eight steps is
√
O((log dmnp) np/2 ). (33)
Further updating of vj and wj are similar as above two algorithms. The difference
is that in the following updating, we should apply the weight vectors (29) and (32)
instead of |vj i, |wj i anymore. Assume that the learning algorithm stops at N steps of
iteration, then the complexities of updating vj and wj are
√ √
O((log dmn) n/N ) and O((log dmnp) np/N ) (34)
respectively. Similarly analysis as remark 2, the complexity (34) can be improved into
O((log dmn)/N ) and O((log dmnp)/N ) by choosing suitable sigmoid functions in each
layer. If the number N of iterations is not large, then this achieves an exponential
speedup over the classical birch learning algorithm, whose complexity is O(N dmn) and
O(N dn(m + p)) respectively.
The complexity of batch learning algorithm (34) is exponential at the number N
of iterations. Just like online learning, we can compute all the desired coefficients
in advance. This can reduce the dependence of N into polynomial, however, the
dependence on d, m, n, p will be worse than (34).
Hopfield neural network (HNN) is one model of recurrent neural network [15] with
Hebb’s learning rule [14]. HNN serve as associative memory systems with binary
threshold neurons. A variant form of HNN with continuous input range was proposed
in [16]. The method introduced in this paper cannot directly applied to learn HNN in
quantum computer. However, an amendatory version still exists. In this section, we will
not focus too much on the details of learning algorithm of HNN in quantum computer.
Also the reader can find more introductions about HNN in [13, 24, Chapter 13]. A brief
introduction about HNN can be found in [29]. The idea about quantum HNN model
applied here is different from [23], where they study HNN from density matrix and some
related techniques (such as quantum principal component analysis and HHL algorithm).
Assume that X = (xij )P ×N is a P ×N matrices whose rows store the firing patterns
of HNN. Denote the i-th row of X as ui , the j-th column of X as vj . The weight matrix
W = (wij )N ×N of HNN satisfies wij = wji and wii = 0. Denote the j-th column of W
as wj . The Hebb’s learning rule of HNN is defined as
1
wij = vi · vj . (35)
P
And the update rule of X is defined by
xij = ϕ(ui · wj ), (36)
where ϕ is the threshold function (10) if all xij take discrete values in {−1, 1} or the
sigmoid function (14) if all xij have continuous ranges.
A Quantum Model for Multilayer Perceptron 21
Since the input of HNN is the matrix X, it is better to consider the following
quantum state
P −1 N −1 P −1
1 XX 1 X
|Xi = xij |i, ji = kui k|i, ui i
kXkF i=0 j=0 kXkF i=0
N −1 (37)
1 X
= kvj k|vj , ji,
kXkF j=0
qP
where kXkF = i,j |xij |2 is the Frobenius norm of X. Similarly, one can define |W i.
We consider the problem to get the quantum state |W i by only giving |Xi. Consider
the quantum state
N −1 N −1
1 X 1 X
|i, ji|+i|Xi = kvk k|i, ji|+i|vk , ki. (38)
N i,j=0 N kXkF i,j,k=0
For any i, j, define
( (
(0) 1, if i = j; (1) 1, if i = j;
δi,j = δi,j =
0, if i 6= j. 2, if i 6= j.
(0)
As for |i, ji|0i|vk , ki, we change it into |i, ji|0i|vk , ki|δi,k i. When i = k, we produce a
copy of |ji to get |i, ji|0i|vi , i, ji|1i. Similarly, as for |i, ji|1i|vk , ki, we change it into
(1)
|i, ji|1i|vk , ki|δj,k i. And when j = k, we produce a copy of |ii to get |i, ji|1i|vj , i, ji|1i.
Perform these operations on (38), we obtain
N −1
1 X h i
√ |i, ji |0i(kvi k|vi , i, ji|1i + |φ0 i|0i) + |1i(kvj k|vj , i, ji|1i + |φ2 i|2i) , (39)
2N kXkF i,j=0
where |φ0 i, |φ2 i are some undesired quantum states. Note that the inner product of
kvi k|vi , i, ji|1i + |φ0 i|0i and kvj k|vj , i, ji|1i + |φ2 i|2i equals kvi kkvj khvi |vj i = P wij . By
proposition 2 and undo the above procedure, we will obtain
N −1 i P skW k
P X h q
F
2 2
|i, ji swij |0i + 1 − s wij |1i = |W i|0i + |0i⊥ . (40)
N kXkF i,j=0 N kXkF
where s = 1/ maxi,j |wij |. The whole costs to get (40) is O((log(P N ))/). The update
rule (36) can be considered similarly by considering √12 (|0i|Xi + |1i|W i) instead of
|+i|Xi in (38). When the number of iterations is not large, this algorithm achieves an
exponential speedup over the classical Hebb’s learning rule of HNN. The above idea has
been applied in [32] to study matrix multiplication by parallel swap test, which achieves
much better performance than SVE [21] or HHL algorithm [12]. Here, we also see that a
similar idea also plays an important role in the learning of HNN in quantum computer.
7. Conclusions
performance is much better than the classical algorithms when the number of layers
or the number of iterations in not large. On one hand, as for the parallel swap test
technique, it play an important role in the design of the quantum model of multilayer
perceptron and Hopfield network. So as a research problem, it deserves to find more
applications of parallel swap test. Actually, we already found one such application
in the Tikhonov regularization problem, which is used to deal with ill-posed inverse
problem. On the other hand, with this quantum model of multilayer perceptron and
Hopfield network, it remains a problem to find its applications in quantum machine
learning. However, the learning algorithm in multilayer perceptron is based on gradient
descent algorithm, which contains a low convergence rate. So it may deserve to consider
the learning algorithm based on Newton’s method or quasi-Newton’s method, where
quantum computer can also achieve certain speedups under certain conditions. Finally,
it reserves as a problem to generalize this model to study the multilayer perceptron with
more than two layers.
Acknowledgments
This work is supported by the NSFC Project 11671388 and the CAS Frontier Key
Project QYZDJ-SSW-SYS022.
References
[1] Alpaydin E 2015 Introduction to Machine Learning (3rd edition, MIT press)
[2] Altaisky M V 2001 Quantum neural network arXiv:quant-ph/0107012v2
[3] Andrecut M, Ali M K 2002 Int. J. Mod. Phys. C 13 75
[4] Behrman E C, Nash L, Steck J E, Chandrashekar V, Skinner S R 2002 Inf. Sci. 128 257
[5] Berry D W, Childs A M, Cleve R, Kothari R, Somma R D 2015 Phys. Rev. Lett. 114 090502
[6] Bishop C M 1996 Neural Networks for Pattern Recognition (1st edition, Clarendon Press, Oxford)
[7] Bourlard H, Kamp Y 1988 Biological Cybernetics 59 291
[8] Buhrman H, Cleve R, Watrous J, Wolf R de 2001 Phys. Rev. Lett. 87 167902
[9] Childs A M, Kothari R, Somma R D 2017 SIAM J. Comput. 46 1920
[10] Clader B D, Jacobs B C, Sprouse C R 2013 Phys. Rev. Lett. 110 250504
[11] Fukunaga K 1990 Statistical Pattern Recognition (2nd edition, Academic Press, New York)
[12] Harrow A W, Hassidim A, Lloyd S 2009 Physical review letters 103 150502
[13] Haykin S 2009 Neural Networks and Learning Machines (3rd edition, Pearson)
[14] Hebb D O 2002 The organization of behavior: A neuropsychological theory (Lawrence Erlbaum)
[15] Hopfield J J 1982 Proc. Nat. Acad. Sci. 79 2554
[16] Hopfield J J 1984 Proc. Nat. Acad. Sci. 81 3088
[17] Hornik K, Stinchcombe M, White H 1989 Neural Networks 2 359
[18] Hui C-L 2011 Artificial Neural Networks-Application (InTech)
[19] Jain A K, Mao J C, Mohiuddin K M 1996 Computer 29 31
[20] Kak C S 1995 Adv. Imaging Electron Phys. 94 259
[21] Kerenidis I, Prakash A 2017 ITCS 49 1
[22] McCulloch W S, Pitts W 1943 Bull. Math. Biol. 5 115
[23] Rebentrost P, Bromley T R, Weedbrook C, Lloyd S 2017 A Quantum Hopfield Neural Network
arXiv:1710.03599v2
[24] Rojas R 1996 Neural Networks: A Systematic Introduction (Springer)
A Quantum Model for Multilayer Perceptron 23
[25] Ricks B, Ventura D 2003 Advances in Neural Information Processing Systems: Proceedings of the
2003 Conference vol 16 (A Bradford Book) p 1
[26] Rosenblatt F 1958 Psychological Review 65 386
[27] Rumelhart D E, Hinton G E, Williams R J 1986 Nature 323 533
[28] Schiffman W H, Geffers H W 1993 Neural Networks 6 517
[29] Schuld M, Sinayskiy I, Petruccione F 2014 Quantum Inf. Process 13 2567
[30] Schuld M, Sinayskiy I, Petruccione F 2015 Phys. Lett. A 379 660
[31] Schuld M, Petruccione F 2018 Supervised Learning with Quantum Computers (Springer)
[32] Shao C P 2018 Quantum Algorithms to Matrix Multiplication arXiv:1803.01601v2
[33] Shao C P 2018 From linear combination of quantum states to Grover’s searching algorithm
arXiv:1807.09693v2
[34] Silva A J da, Ludermir T B, Oliveira W R de 2016 Neural Networks 76 55
[35] Siomau M 2014 Quantum Inf. Processing 13 1211
[36] Suykens J A K, Vandewalle J P L, DeMoor B L R 1996 Artificial Neural Networks for Modeling
and Control of Non-Linear Systems (Kluwer, Dordrecht)
[37] Wan K H, Dahlsten O, Kristjánsson H, Gardner R, Kim M S 2017 npj Quantum Inf. 3 36