demonstrated in [3, 15, 22, 8, 6] where experimental im- Input layer Reservoir Output layer
nection matrix aij . The resulting evolution equations are: The aim of the channel equaliser is to reconstruct d(n)
from u(n). The performance of the equaliser is measured
using the Symbol Error Rate (SER), that is the fraction of
x0 (n + 1) = sin (αxN (n − 1) + βb0 u(n)) , (2a) symbols that are wrongly reconstructed. For comparison, a
xi (n + 1) = sin (αxi−1 (n) + βbi u(n)) , (2b) 50-neuron reservoir computer with a sine nonlinearity can
achieve a symbol error rate as low as 1.6 × 10−4 for a chan-
i = 1, . . . , N − 1, where α and β are feedback and in- nel with very low Gaussian noise level (SNR = 32 dB), see
put gain parameters, and the input mask bi is drawn from [22].
a uniform distribution over the the interval [−1, +1], as in In wireless communications, the environment has a great
[24, 22, 8]. impact on the quality of the signal. Given its highly variable
The output y(n) of the reservoir computer is a linear nature, the properties of the channel may be subject to im-
combination of the states of the variables: portant changes in real time. To demonstrate the ability of
N −1
the online training method to adapt to these variations, we
studied the situation where the channel changes in real time.
y(n) = wi xi (n), (3)
For this application we set the noise to zero, and considered
that the channel is slightly modified, with equation (5) re-
where wi are the readout weights. The aim is to get the out- placed by three slightly different channels:
put signal as close as possible to some target signal d(n).
For that purpose, the readout weights are trained either of- u1 (n) = 1.00q(n) + 0.036q 2 (n) − 0.011q 3 (n), (6a)
fline (using standard linear regression methods), or online, 2
u2 (n) = 0.85q(n) + 0.036q (n) − 0.011q (n), 3
as described in section 4. 2 3
u3 (n) = 0.70q(n) + 0.036q (n) − 0.011q (n). (6c)
3. Channel equalisation
We regularly switched from one channel to another. The
We focus on a single task to demonstrate the implemen- results of this experiment are presented in section 7.2.
tation of an online training algorithm on a FPGA board. We
chose to work on a channel equalisation task introduced in 4. Gradient descent algorithm
[21]. This is a popular choice in the reservoir computing
community, as it doesn’t require the use of large reservoirs Taking into account the linearity of the readout layer,
to obtain state-of-the-art performance, and it was used in finding the optimal readout weights wi is very simple [13].
e.g. [13, 24, 22, 8]. After having collected all the reservoir states xi (n), the
The aim is to reconstruct the input of a noisy nonlinear problem is reduced to solving a system of N linear equa-
wireless communication channel from its output u(n). The tions. However, this so-called offline training method re-
input d(n) ∈ {−3, −1, 1, 3} is subject to symbol interfer- quires the use of a large set of inputs and reservoir states.
ence: For state-of-the-art experimental performance, one needs to
generate thousands of symbols and thus hundreds of thou-
q(n) = 0.08d(n + 2) − 0.12d(n + 1) + d(n) sands of reservoir states. In our experiments [22, 8], the in-
+ 0.18d(n − 1) − 0.1d(n − 2) + 0.091d(n − 3) ternal memory size of arbitrary waveform generators, used
− 0.05d(n − 4) + 0.04d(n − 5) + 0.03d(n − 6) for this task, can become the limiting factor for this method.
Here we use an entirely different approach. Instead of
+ 0.01d(n − 7),
(4) collecting and post-processing data, we generate data and
and nonlinear distortion: train the reservoir in real time. That is, the inputs are gen-
erated one by one, and for every reservoir output a small
u(n) = q(n) + 0.036q 2 (n) − 0.011q 3 (n) + ν(n), (5) correction is applied to the readout weights, proportional to
the reservoir error from the target output. Our approach falls
where ν(n) = A · r(n) is the noise with amplitude A and within the online training category and we use the gradient
r(n) are independent random numbers drawn (for ease of descent algorithm to compute the corrections to the readout
implementation on the FPGA board) from a uniform dis- weights (already used in [7]).
tribution over the interval [−1, +1]. In our experiments, The gradient or steepest descent method is a way to find
we considered three levels of noise: low, medium and a local minimum of a function using its gradient [4].
high, with amplitudes Alow = 0.1, Amedium = 0.5 and For the channel equalisation task, this methods provides
Ahigh = 1.0. These values were chosen for sake of com- the following rule for updating the readout weights:
parison with 20 dB, 13 dB and 10 dB signal-to-noise-ratios
reported in [22, 8] (with Gaussian noise). wi (n + 1) = wi (n) + λ (d(n) − y(n)) xi (n). (7)
The step size λ should be small enough not to overshoot Note that we didn’t devote any effort to performance op-
the minimum at every step, but big enough for reasonable timisation and these results are presented only for the sake
convergence time. In practise, we start with a high value of comparison. Our goal was to implement a simple online
λ0 , and then gradually decrease it during the training phase training algorithm on an embedded system, with no inten-
until a minimum value λmin is reached, according to the tion of competing against a regular CPU in terms of execu-
equation: tion speed.
d(n) xi (n) wi (n + 1)
Discret Out Update Check IP Cores
xi (n) wi (n) y(n) d(n) USB
x0 (n), . . . , xN −1 (n)
Sim Train
Figure 2. Simplified schematics of our design. The Sim entity simulates the nonlinear channel and the reservoir. It generates reservoir states
xi (n) and target outputs d(n). The RC entity produces the reservoir output y(n) after calculating the linear combination of discretised
reservoir states and readout weights wi (n). The Train entity implements the gradient descent algorithm to update the readout weights
wi (n + 1) and measures the performance of the reservoir in terms of symbol error rate (SER). The IP Cores box contains proprietary
code for transferring data from the board to the computer.
Symbols (×103 )
vides the resulting reservoir output.
0 100 200 300
Low noise
The training entity Train contains the implementation Medium noise
High noise
of the gradient descent algorithm. The Update process 150
calculates the new readout weights and decreases the value
SER (×10−3 )
bols and output to the IP Cores, which transfers the data 0 0.2 0.4 0.6 0.8 1
To take into account the possible variations of the Figure 3. Training results for various noise levels, expressed in
communications channel (as described in section 3), the terms of symbol error rate. Horizontal axes show both time since
Update process monitors the performance of the reservoir the beginning of the experiment (lower axis) and the number of
symbols n processed by the FPGA (upper axis).
computer. If it detects an increase of the symbol error rate
above a certain threshold (we used SERth = 3 × 10−2 ) af-
ter the training is complete (that is, when λ = λmin ), the λ 7. Results
parameter is reset to λ0 and the training starts over again.
7.1. Constant channel
Precision and real numbers representation is of key im- Figure 3 shows the symbol error rates captured from the
portance when programming arithmetic operations on a bit- FPGA board for different noise levels. Error rates are plot-
logic device. In our design, we implemented a fixed-point ted against time, as well as the number of reservoir com-
representation with 16-bit precision (1 bit for the sign, 4 bits puter outputs. The following parameter values were used
for the integer part and 11 for the decimal part). for the gradient descent algorithm:
Synthesis reports that our design uses only 7% of the λ0 = 0.4, λmin = 0, γ = 0.999, (9)
board’s slice LUTs for logic and arithmetic operations. The
block RAM usage is much higher, at 77%, as it is used to and the λ parameter was updated every 100 outputs (i.e.
store data from the board before transferring through the k = 100 in equation (8)). The asymptotic error rate is
slow USB port. The current design is driven by a 33 MHz reached at, approximately, t = 0.4 s. The noise level in the
clock, with little effort devoted to optimisation and critical channel has no impact on the training time, but increases
paths analysis. In the future timing performance will be im- the final symbol error rate. For a low-noise channel (with
proved by rewriting the design to take maximum advantage An = 0.1), the symbol error rate is as low as 5.8 × 10−3 ,
of the FPGA’s concurrent logic. that is, 5 to 6 errors per one thousand symbols. For medium
Symbols (×103 ) RC Outputs (×103 )
0 100 200 300 400 200 400 600
Slow training λ
Optimal training Channel (6b) SER
Fast training Channel (6a) Channel (6c)
200 0.4
SER (×10−3 )
SER (×10−3 )
100 0.2
0 0.5 1 1.5 0.5 1 1.5 2 2.5
Time (s) Time (s)
Figure 4. Various training speeds depending on the update rate k. Figure 5. Symbol error rate produced by the FPGA in case of a
A fast training, with k = 10, gives a high SER of 1.55 × 10−2 , variable communications channel. The channels are switched at
while a slow training, with k = 1000, produces a very low SER preprogrammed times. The change in channel is followed imme-
of 2.6 × 10−3 at cost of a longer convergence time. The optimal diately by a steep increase of the SER. The λ parameter is auto-
training with k = 100 is a compromise between the performance matically reset to λ0 = 0.4 every time a performance degradation
and speed. is detected, and then returns to its minimum value, as the equaliser
adjusts to the new channel, bringing down the SER to its asymp-
totic value. The final value of the SER depends on the channel:
the more non-linear the channel, the higher the resulting SER.
