Antonik 2015 Fpga
Antonik 2015 Fpga
Antonik 2015 Fpga
1
demonstrated in [3, 15, 22, 8, 6] where experimental im- Input layer Reservoir Output layer
2
nection matrix aij . The resulting evolution equations are: The aim of the channel equaliser is to reconstruct d(n)
from u(n). The performance of the equaliser is measured
using the Symbol Error Rate (SER), that is the fraction of
x0 (n + 1) = sin (αxN (n − 1) + βb0 u(n)) , (2a) symbols that are wrongly reconstructed. For comparison, a
xi (n + 1) = sin (αxi−1 (n) + βbi u(n)) , (2b) 50-neuron reservoir computer with a sine nonlinearity can
achieve a symbol error rate as low as 1.6 × 10−4 for a chan-
i = 1, . . . , N − 1, where α and β are feedback and in- nel with very low Gaussian noise level (SNR = 32 dB), see
put gain parameters, and the input mask bi is drawn from [22].
a uniform distribution over the the interval [−1, +1], as in In wireless communications, the environment has a great
[24, 22, 8]. impact on the quality of the signal. Given its highly variable
The output y(n) of the reservoir computer is a linear nature, the properties of the channel may be subject to im-
combination of the states of the variables: portant changes in real time. To demonstrate the ability of
N −1
the online training method to adapt to these variations, we
studied the situation where the channel changes in real time.
X
y(n) = wi xi (n), (3)
i=0
For this application we set the noise to zero, and considered
that the channel is slightly modified, with equation (5) re-
where wi are the readout weights. The aim is to get the out- placed by three slightly different channels:
put signal as close as possible to some target signal d(n).
For that purpose, the readout weights are trained either of- u1 (n) = 1.00q(n) + 0.036q 2 (n) − 0.011q 3 (n), (6a)
fline (using standard linear regression methods), or online, 2
u2 (n) = 0.85q(n) + 0.036q (n) − 0.011q (n), 3
(6b)
as described in section 4. 2 3
u3 (n) = 0.70q(n) + 0.036q (n) − 0.011q (n). (6c)
3. Channel equalisation
We regularly switched from one channel to another. The
We focus on a single task to demonstrate the implemen- results of this experiment are presented in section 7.2.
tation of an online training algorithm on a FPGA board. We
chose to work on a channel equalisation task introduced in 4. Gradient descent algorithm
[21]. This is a popular choice in the reservoir computing
community, as it doesn’t require the use of large reservoirs Taking into account the linearity of the readout layer,
to obtain state-of-the-art performance, and it was used in finding the optimal readout weights wi is very simple [13].
e.g. [13, 24, 22, 8]. After having collected all the reservoir states xi (n), the
The aim is to reconstruct the input of a noisy nonlinear problem is reduced to solving a system of N linear equa-
wireless communication channel from its output u(n). The tions. However, this so-called offline training method re-
input d(n) ∈ {−3, −1, 1, 3} is subject to symbol interfer- quires the use of a large set of inputs and reservoir states.
ence: For state-of-the-art experimental performance, one needs to
generate thousands of symbols and thus hundreds of thou-
q(n) = 0.08d(n + 2) − 0.12d(n + 1) + d(n) sands of reservoir states. In our experiments [22, 8], the in-
+ 0.18d(n − 1) − 0.1d(n − 2) + 0.091d(n − 3) ternal memory size of arbitrary waveform generators, used
− 0.05d(n − 4) + 0.04d(n − 5) + 0.03d(n − 6) for this task, can become the limiting factor for this method.
Here we use an entirely different approach. Instead of
+ 0.01d(n − 7),
(4) collecting and post-processing data, we generate data and
and nonlinear distortion: train the reservoir in real time. That is, the inputs are gen-
erated one by one, and for every reservoir output a small
u(n) = q(n) + 0.036q 2 (n) − 0.011q 3 (n) + ν(n), (5) correction is applied to the readout weights, proportional to
the reservoir error from the target output. Our approach falls
where ν(n) = A · r(n) is the noise with amplitude A and within the online training category and we use the gradient
r(n) are independent random numbers drawn (for ease of descent algorithm to compute the corrections to the readout
implementation on the FPGA board) from a uniform dis- weights (already used in [7]).
tribution over the interval [−1, +1]. In our experiments, The gradient or steepest descent method is a way to find
we considered three levels of noise: low, medium and a local minimum of a function using its gradient [4].
high, with amplitudes Alow = 0.1, Amedium = 0.5 and For the channel equalisation task, this methods provides
Ahigh = 1.0. These values were chosen for sake of com- the following rule for updating the readout weights:
parison with 20 dB, 13 dB and 10 dB signal-to-noise-ratios
reported in [22, 8] (with Gaussian noise). wi (n + 1) = wi (n) + λ (d(n) − y(n)) xi (n). (7)
3
The step size λ should be small enough not to overshoot Note that we didn’t devote any effort to performance op-
the minimum at every step, but big enough for reasonable timisation and these results are presented only for the sake
convergence time. In practise, we start with a high value of comparison. Our goal was to implement a simple online
λ0 , and then gradually decrease it during the training phase training algorithm on an embedded system, with no inten-
until a minimum value λmin is reached, according to the tion of competing against a regular CPU in terms of execu-
equation: tion speed.
4
d(n) xi (n) wi (n + 1)
b
SER
PC
Discret Out Update Check IP Cores
xi (n) wi (n) y(n) d(n) USB
b
x0 (n), . . . , xN −1 (n)
Sim Train
RC
FPGA
Figure 2. Simplified schematics of our design. The Sim entity simulates the nonlinear channel and the reservoir. It generates reservoir states
xi (n) and target outputs d(n). The RC entity produces the reservoir output y(n) after calculating the linear combination of discretised
reservoir states and readout weights wi (n). The Train entity implements the gradient descent algorithm to update the readout weights
wi (n + 1) and measures the performance of the reservoir in terms of symbol error rate (SER). The IP Cores box contains proprietary
code for transferring data from the board to the computer.
Symbols (×103 )
vides the resulting reservoir output.
0 100 200 300
Low noise
The training entity Train contains the implementation Medium noise
High noise
of the gradient descent algorithm. The Update process 150
calculates the new readout weights and decreases the value
SER (×10−3 )
bols and output to the IP Cores, which transfers the data 0 0.2 0.4 0.6 0.8 1
To take into account the possible variations of the Figure 3. Training results for various noise levels, expressed in
communications channel (as described in section 3), the terms of symbol error rate. Horizontal axes show both time since
Update process monitors the performance of the reservoir the beginning of the experiment (lower axis) and the number of
symbols n processed by the FPGA (upper axis).
computer. If it detects an increase of the symbol error rate
above a certain threshold (we used SERth = 3 × 10−2 ) af-
ter the training is complete (that is, when λ = λmin ), the λ 7. Results
parameter is reset to λ0 and the training starts over again.
7.1. Constant channel
Precision and real numbers representation is of key im- Figure 3 shows the symbol error rates captured from the
portance when programming arithmetic operations on a bit- FPGA board for different noise levels. Error rates are plot-
logic device. In our design, we implemented a fixed-point ted against time, as well as the number of reservoir com-
representation with 16-bit precision (1 bit for the sign, 4 bits puter outputs. The following parameter values were used
for the integer part and 11 for the decimal part). for the gradient descent algorithm:
Synthesis reports that our design uses only 7% of the λ0 = 0.4, λmin = 0, γ = 0.999, (9)
board’s slice LUTs for logic and arithmetic operations. The
block RAM usage is much higher, at 77%, as it is used to and the λ parameter was updated every 100 outputs (i.e.
store data from the board before transferring through the k = 100 in equation (8)). The asymptotic error rate is
slow USB port. The current design is driven by a 33 MHz reached at, approximately, t = 0.4 s. The noise level in the
clock, with little effort devoted to optimisation and critical channel has no impact on the training time, but increases
paths analysis. In the future timing performance will be im- the final symbol error rate. For a low-noise channel (with
proved by rewriting the design to take maximum advantage An = 0.1), the symbol error rate is as low as 5.8 × 10−3 ,
of the FPGA’s concurrent logic. that is, 5 to 6 errors per one thousand symbols. For medium
5
Symbols (×103 ) RC Outputs (×103 )
0 100 200 300 400 200 400 600
0.6
Slow training λ
Optimal training Channel (6b) SER
Fast training Channel (6a) Channel (6c)
100
200 0.4
SER (×10−3 )
SER (×10−3 )
λ
50
100 0.2
30
50
10
10
0
0 0.5 1 1.5 0.5 1 1.5 2 2.5
Time (s) Time (s)
Figure 4. Various training speeds depending on the update rate k. Figure 5. Symbol error rate produced by the FPGA in case of a
A fast training, with k = 10, gives a high SER of 1.55 × 10−2 , variable communications channel. The channels are switched at
while a slow training, with k = 1000, produces a very low SER preprogrammed times. The change in channel is followed imme-
of 2.6 × 10−3 at cost of a longer convergence time. The optimal diately by a steep increase of the SER. The λ parameter is auto-
training with k = 100 is a compromise between the performance matically reset to λ0 = 0.4 every time a performance degradation
and speed. is detected, and then returns to its minimum value, as the equaliser
adjusts to the new channel, bringing down the SER to its asymp-
totic value. The final value of the SER depends on the channel:
the more non-linear the channel, the higher the resulting SER.
6
First of all, the chip is fast enough to interface in real time C. R. Mirasso, and I. Fischer. Information process-
with experiments such as the optoelectronic reservoir com- ing using a single dynamical node as complex system.
puter reported in [22]. It will also allow much shorter ex- Nat. Commun., 2:468, 2011.
periment run time by removing the slow process of data [4] G. B. Arfken. Mathematical methods for physicists.
transfer from the experiment to the computer. Moreover, Orlando FL: Academic Press, 1985.
it will solve the memory limitation issue, thus allowing the
[5] J. Bongard. Biologically inspired computing. IEEE
reservoir to be trained or tested over an arbitrary long input
Comp., 42:95–98, 2009.
sequence. Lastly, interfacing a physical reservoir computer
with a FPGA chip in real time will make it possible to feed [6] D. Brunner, M. C. Soriano, C. R. Mirasso, and I. Fis-
the output of the reservoir back into itself, thus enriching its cher. Parallel photonic information processing at giga-
dynamics, and allowing new functionalities such as pattern byte per second data rates using transient states. Nat.
generation [27]. Commun., 4:1364, 2012.
The online learning algorithm will be particularly use- [7] K. Caluwaerts, M. D’Haene, D. Verstraeten, and
ful to train analog output layers, such as the ones reported B. Schrauwen. Locomotion without a brain: physical
in [26], where the main difficulty encountered was the ne- reservoir computing in tensegrity structures. Artificial
cessity of a very precise model of the output layer. Using life, 19(1):35–66, 2013.
an FPGA chip to implement online training based on gra- [8] F. Duport, B. Schneider, A. Smerieri, M. Haelterman,
dient descent algorithm, such a modelling will no longer be and S. Massar. All-optical reservoir computing. Opt.
required. Express, 20:22783–22795, 2012.
On top of these results, the use of a FPGA chip paves
[9] B. Hammer, B. Schrauwen, and J. J. Steil. Recent ad-
the way towards fast reservoir computers, as all aforemen-
vances in efficient learning of recurrent networks. In
tioned experimental implementations are limited in speed
Proceedings of the European Symposium on Artificial
by the output layer, implemented offline on a relatively slow
Neural Networks, pages 213–216, Bruges (Belgium),
computer. We presented here an implementation that is two
April 2009.
orders of magnitude faster than a high-end personal com-
puter. Note that the speed of our design can be further in- [10] S. Haykin. Adaptive filter theory. Prentice-Hall, Up-
creased by thorough optimisation, that should allow to raise per Saddle River, New Jersey, 2000.
the clock rate up to 200 MHz. Once the maximum com- [11] N. D. Haynes, M. C. Soriano, D. P. Rosin, I. Fischer,
putational power achievable on a FPGA board is reached, and D. J. Gauthier. Reservoir computing with a single
an upgrade to an ASIC chip would offer the opportunity to time-delay autonomous Boolean node. arXiv preprint
advance towards GHz rates. The present work thus demon- arXiv:1411.1398, Nov. 2014.
strates the vast computational power that a FPGA chip can [12] H. Jaeger. The “echo state” approach to analysing
offer to the reservoir computing field. and training recurrent neural networks-with an erra-
tum note’. GMD Report, 148:German National Re-
Acknowledgements search Center for Information Technology, 2001.
We would like to thank Benoı̂t Denègre and Ali Chiche- [13] H. Jaeger and H. Haas. Harnessing nonlinearity: Pre-
bor for helpful discussions. We acknowledge financial sup- dicting chaotic systems and saving energy in wireless
port by Interuniversity Attraction Poles program of the Bel- communication. Science, 304:78–80, 2004.
gian Science Policy Office, under grant IAP P7-35 photon- [14] H. Jaeger, M. Lukoševičius, D. Popovici, and U. Siew-
ics@be and by the Fonds de la Recherche Scientifique FRS- ert. Optimization and applications of echo state net-
FNRS. works with leaky-integrator neurons. Neural Netw.,
20:335–352, 2007.
References [15] L. Larger, M. Soriano, D. Brunner, L. Appeltant, J. M.
Gutiérrez, L. Pesquera, C. R. Mirasso, and I. Fischer.
[1] IEEE Standard VHDL Language Reference Manual.
Photonic information processing beyond Turing: an
ANSI/IEEE Std 1076-1993, 1994.
optoelectronic implementation of reservoir comput-
[2] The 2006/07 forecasting competition for neural ing. Opt. Express, 20:3241–3249, 2012.
networks & computational intelligence. http:// [16] R. Legenstein, S. M. Chase, A. B. Schwartz, and
www.neural-forecasting-competition. W. Maass. A reward-modulated hebbian learning rule
com/NN3/, 2006. (Date of access: 21.02.2014). can explain experimentally observed network reorga-
[3] L. Appeltant, M. C. Soriano, G. Van der Sande, nization in a brain control task. J. Neurosci., 30:8400–
J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, 8410, 2010.
7
[17] M. Lukoševičius. A practical guide to applying echo
state networks. In Neural Networks: Tricks of the
Trade, pages 659–686. Springer, 2012.
[18] M. Lukoševičius and H. Jaeger. Survey: Reservoir
computing approaches to recurrent neural network
training. Comp. Sci. Rev., 3:127–149, 2009.
[19] M. Lukoševičius, H. Jaeger, and B. Schrauwen. Reser-
voir computing trends. Künst. Intell., 26:365–371,
2012.
[20] W. Maass, T. Natschläger, and H. Markram. Real-
time computing without stable states: A new frame-
work for neural computation based on perturbations.
Neural comput., 14:2531–2560, 2002.
[21] V. J. Mathews and J. Lee. Adaptive algorithms for bi-
linear filtering. In SPIE’s 1994 International Sympo-
sium on Optics, Imaging, and Instrumentation, pages
317–327. International Society for Optics and Photon-
ics, 1994.
[22] Y. Paquot, F. Duport, A. Smerieri, J. Dambre,
B. Schrauwen, M. Haelterman, and S. Massar. Opto-
electronic reservoir computing. Sci. Rep., 2:287, 2012.
[23] V. Pedroni. Circuit Design with VHDL. MIT Press,
2004.
[24] A. Rodan and P. Tino. Minimum complexity echo
state network. IEEE Trans. Neural Netw., 22:131–144,
2011.
[25] B. Schrauwen, M. D’Haene, D. Verstraeten, and J. V.
Campenhout. Compact hardware liquid state ma-
chines on FPGA for real-time speech recognition.
Neural Netw., 21:511–523, 2008.
[26] A. Smerieri, F. Duport, Y. Paquot, B. Schrauwen,
M. Haelterman, and S. Massar. Analog readout for
optical reservoir computers. In Advances in Neural
Information Processing Systems 25, pages 944–952.
Curran Associates, Inc., 2012.
[27] D. Sussillo and L. Abbott. Generating coherent pat-
terns of activity from chaotic neural networks. Neu-
ron, 63(4):544 – 557, 2009.
[28] F. Triefenbach, A. Jalalvand, B. Schrauwen, and J.-P.
Martens. Phoneme recognition with large hierarchical
reservoirs. Adv. Neural Inf. Process. Syst., 23:2307–
2315, 2010.
[29] D. Verstraeten, B. Schrauwen, and D. Stroobandt.
Reservoir computing with stochastic bitstream neu-
rons. In Proceedings of the 16th annual Prorisc work-
shop, pages 454–459, 2005.
[30] D. Verstraeten, B. Schrauwen, and D. Stroobandt.
Reservoir-based techniques for speech recognition. In
IJCNN’06. International Joint Conference on Neu-
ral Networks, pages 1050–1053, Vancouver, BC, July
2006.