The proposed methodology focuses on developing a BCI system by interconnecting an Emotiv EPOC+ headset, an Altera SoCKit Cyclone V SoC board, and a hexapod robot for validation. For this purpose, we created a database using basic MI movements (forward, backward, and stop) with four test subjects. The recognition process was carried out using a CNN-LSTM architecture. The operational system transforms the continuous MI-EEG signals into command instructions to control a hexapod robot’s locomotion.
2.2. A SoCKit Module Configuration
The SoCKit Cyclone V FPGA card powered by an ARM Cortex
® A9 Hard Processor System (HPS) was used to implement the EEG signal processing algorithms and the classifier.
Figure 2 shows the basic SoCKit functional blocks.
Communications were established between the processor and the FPGA core by configuring the Xillybus Intellectual Properties (IPs) Core, as showed in
Figure 3.
Emotiv EPOC+ data writing (rd-en) is enabled as First-In, First-Out (FIFO) when it is empty. After reading the data, the Xillybus communicates with the processor core using the Advanced eXtensible Interface (AXI) bus, generating Direct Memory Access (DMA) requests on the central CPU bus. Simultaneously, the low-level FIFO (FPGA) is released (the full-en signal is low), and Xillybus carries the data from the processor core to the FPGA to control the hexapod. The project Xillybus IPs Core was designed to use four FIFOs, two focused on reading and two others on writing data. Each FIFO was configured to a 32 bit data width, a data transmission latency of 5 ms, a bandwidth of 10 MB/s, and a buffering time to autoset. The FPGA is internally forced to control the buffer RAM distribution for continuous reading and writing operations by configuring the buffering time to
autoset and specifying the planned period for the maximal processor deprivation. The following equation gives the
RAM size required for the DMA buffers’ flow:
where
t is the buffering time and BW is the expected data bandwidth.
For reading, all FIFOs must be empty and the enable signals (rd-en) activated. Thus, EEG data can fill the FIFOs until they all are full and the empty signal is disabled. Since FIFOs work with 32 bit and considering that the Emotiv EPOC+ device has a 14 bit resolution, a zero-padding operation was applied to each signal at the Most Significant Bit (MSB) position. Like the previous procedure, writing is enabled (wr-en at the high level) when all write FIFOs are empty (low level). Therefore, a finite state machine was designed to control the FIFOs’ filling and emptying processes.
The EEG signal reading, processing, and classification algorithms were written in Python, the Verilog Language, the ANSI-C language (Nios
® II Embedded Design Suite), and the Open Computing Language (OpenCL Standard) [
28], which were tested and evaluated on the SoCKit.
Table 1 summarizes the SoCkit resources used in the implemented experiments.
FPGA outputs were wire-connected to the hexapod servo-control board. The Central Pattern Generator (CPG), based on discrete-time neural networks, was adapted to move the hexapod robot [
29].
The locomotion law defined by the CPGs and derived from the discrete-time spiking neuronal model [
30] is mathematically described by:
where
is the firing state of the
ith neuron at time
k,
is the potential membrane,
is the synaptic influences (weights),
is the external current, and
is a dimensionless parameter. Mainly,
is defined as a thresholded Heaviside function.
Moreover, considering that twelve servomotors control the hexapod movements, twelve neurons were required in this model; the input current was not needed (i.e., = 0), and to emulate a linear integrator.
2.3. BCI Dataset
The test subjects provided written consent to capture the EEG signals after carefully reading the experimental protocol to protect confidentiality. The specialized equipment used in the experiment was entirely commercial, not presenting any potential risk to the participants. Seven subjects were initially selected for the training process, and after addressing the defined paradigm, they followed an individual schedule.
Before and during each training session, the Emotiv Software Development Kit (Emotiv Xavier) monitored the subject’s cognitive and emotional performances [
15]. Hence, a dataset was created selecting four test subjects between 23 and 36 years old, trained and supervised to collect signals during several experimental tasks lasting three seconds each. According to the given task, subjects were instructed to stay still during the capture and invited to imagine closing and opening the right or the left fist focused on a stimulus video (
Figure 5).
The stimuli video of the fist closing-opening movements was played on the screen according to the temporal task sequence shown in
Figure 6.
In the capture sequence, the first five seconds served to prepare the test subject, ending this phase with the audible Beep 1, followed by Task 1, related to the left fist MI action. The Beep 2 tone concludes this period and marks a pause of 3 s. Beep 3 triggers the end of this static period and starts a second preparation phase of 2 s. Beep 4 starts Task 2, related to the right fist MI task, ending with Beep 5. Therefore, the developed dataset consists of 2400 trials performed by four subjects (600 trials from each subject), representing 2400 × 19 s (12.67 h) of data capture. For each session duration, only signals of 3 s corresponding to Task 1 (left fist MI), 3 s to Task 2 (right fist MI), and 3 s to neutral action were gathered to build the dataset.
2.4. Data Preprocessing
Signals from the F3, F4, FC5, and FC6 sensors were processed in the MI recognition process [
31,
32]. Such sensors were located in the rear portion of the frontal lobe, as shown in
Figure 7.
The Emotiv EPOC+ headset was configured with three filters: a low-pass filter with a cutoff frequency at 85 Hz, an operational bandwidth between 0.16 and 43 Hz, and a band-rejection filter with a stop-band between 50 and 60 Hz [
15,
33]. According to the International EEG Waveform Society, the project paradigm is based on the
mu rhythm processing, which occupies frequencies between 8 and 12 Hz [
34]. The
mu rhythm is the most used pattern in BCI systems considering the nature of the MI movements [
35,
36]. Thus, the mental imagery of body members’ mobility can be perceived through the
mu rhythm variations at the sensorimotor cortex, avoiding any real movement of the body limbs [
37]. Lotze et al. determined that the left and right hands’ physical movements cause an Event-Related Desynchronization (ERD) of the
mu rhythm power, captured in different motor cortex areas [
38]. Consequently, the F3 and FC5 sensors were selected for the left hemisphere, whereas F4 and FC6 for the right hemisphere on the sensorimotor cortex. Such a choice takes into account the sensor’s closeness to the primary motor cortex location associated with the imagined and physical movements of the left and right hands [
31,
32].
2.5. MI-EEG Signals’ Classification Based on a CNN-LSTM Architecture
Recurrent neural networks (e.g., LSTM networks) are composed of memory units that temporarily store information [
39]. Such a network’s layer structure is not unique because the interconnections between neurons are not based on a transportable (mutable) logic. The feature extraction and classification of EEG signals are done by combining two neural schemes, the CNN and LSTM.
Figure 8 presents the CNN-LSTM architecture integrated into the SoCKit to decode robot commands. The overall network consists of a sequence of layers: a convolutional layer (CNN1), an LSTM layer (LSTM1), a convolutional layer (CNN2), followed by a max-pooling layer, a convolutional layer (CNN3), an LSTM layer (LSTM2), and a dense layer.
A
matrix was applied as the input to the CNN1 layer, which performed 32 convolutions with a
size kernels. In each convolutional layer (CNN1, CNN2, and CNN3), the padding parameter was configured to have the same temporal dimensions between input and output data. Weights were initialized according to a uniform distribution using the
He initialization algorithm [
40]. Dropout was applied to each convolutional layer with parameters tuned to 0.4, 0.2, 0.2, and 0.1 for the CNN1, CNN2, CNN3, and LSTM2 layers, respectively. According to the deep learning software interface Keras [
41], for a dropout rate of 0.1, only 10% of the neurons are zeroed-out during the training phase, which reduces overfitting (overtraining).
On the other hand, LSTM layers contain 32 and 150 cells and receive feature matrices from convolutional layers for processing. The model was implemented in Keras and TensorFlow using the
categorical cross-entropy loss function to evaluate the error between the estimated outputs and the ground-truth. The network was trained for 8000 epochs to meet the max accuracy, using the Nesterov-accelerated Adaptive Moment Estimation (NADAM) optimizer with a batch size of 512. A cyclical learning rate with a step-size of nine and minimum and maximum learning rates of 0.000001 and 0.0005, respectively, was used to speed up training [
42]. The convolutional layers used the leaky Rectified Linear Unit (ReLU) as the activation function with
. This allowed obtaining a small non-zero gradient when a neuron has a negative net input. The leaky ReLU activation function
is defined by:
where
is a small positive constant [
43]. However,
SoftMax was used as the activation function of the fully connected layer following the LSTM2 layer to normalize the outputs, such that they may be interpreted as class probabilities [
44].
Table 2 depicts the principal parameters of the proposed network model.
The convolutional layer CNN1 has only 416 parameters, while the CNN2 and CNN3 layers have 3104 parameters each. It must be highlighted that CNNs do not require a specially designed feature extraction stage because they can perform adaptive feature extraction directly on raw input data. Therefore, there were 125,197 parameters necessary for all layers. The output used a fully connected layer with SoftMax as the activation function, which produces the three class probabilities. During the neural network training, the neuron weights were randomly initialized using the He initialization algorithm.