Fninf 2 1494970
Fninf 2 1494970
REVIEWED BY
Jiancai Leng, Zixiang Liu1* and Juan Zhao2
Qilu University of Technology, China
Riasat Khan, 1
Anhui Vocational College of Grain Engineering, Hefei, China, 2 Hefei University, Hefei, China
North South University, Bangladesh
Jin Xie,
Chinese Academy of Sciences (CAS), China
Introduction: Mental health monitoring utilizing EEG analysis has garnered
*CORRESPONDENCE
Zixiang Liu notable interest due to the non-invasive characteristics and rich temporal
[email protected] information encoded in EEG signals, which are indicative of cognitive and
RECEIVED 11 September 2024 emotional conditions. Conventional methods for EEG-based mental health
ACCEPTED 02 December 2024 evaluation often depend on manually crafted features or basic machine learning
PUBLISHED 03 January 2025
approaches, like support vector classifiers or superficial neural networks. Despite
CITATION the potential of these approaches, they often fall short in capturing the intricate
Liu Z and Zhao J (2025) Leveraging deep
learning for robust EEG analysis in mental spatiotemporal relationships within EEG data, leading to lower classification
health monitoring. accuracy and poor adaptability across various populations and mental health
Front. Neuroinform. 18:1494970. scenarios.
doi: 10.3389/fninf.2024.1494970
Methods: To overcome these limitations, we introduce the EEG Mind-
COPYRIGHT
© 2025 Liu and Zhao. This is an open-access Transformer, an innovative deep learning architecture composed of a Dynamic
article distributed under the terms of the Temporal Graph Attention Mechanism (DT-GAM), a Hierarchical Graph
Creative Commons Attribution License (CC
Representation and Analysis (HGRA) module, and a Spatial-Temporal Fusion
BY). The use, distribution or reproduction in
other forums is permitted, provided the Module (STFM). The DT-GAM is designed to dynamically extract temporal
original author(s) and the copyright owner(s) dependencies within EEG data, while the HGRA models the brain’s hierarchical
are credited and that the original publication
structure to capture both localized and global interactions among different
in this journal is cited, in accordance with
accepted academic practice. No use, brain regions. The STFM synthesizes spatial and temporal elements, generating
distribution or reproduction is permitted a comprehensive representation of EEG signals.
which does not comply with these terms.
Results and discussion: Our empirical results confirm that the EEG Mind-
Transformer significantly surpasses conventional approaches, achieving an
accuracy of 92.5%, a recall of 91.3%, an F1-score of 90.8%, and an AUC of 94.2%
across several datasets. These findings underline the model’s robustness and its
generalizability to diverse mental health conditions. Moreover, the EEG Mind-
Transformer not only pushes the boundaries of state-of-the-art EEG-based
mental health monitoring but also offers meaningful insights into the underlying
brain functions associated with mental disorders, solidifying its value for both
research and clinical settings.
KEYWORDS
EEG, mental health monitoring, transformer, application of EEG, neural electrical signals
1 Introduction
Monitoring mental health through electroencephalography (EEG) has become an
increasingly important area of research due to the growing recognition of mental health
issues and the need for non-invasive, objective, and continuous monitoring methods. EEG,
with its ability to capture the brain’s electrical activity in real-time, offers unique insights
into the neural processes underlying various mental health conditions (Michelmann et al.,
2020). Not only can EEG provide a window into the brain’s functioning, but it can
also help in the early detection and management of mental disorders (Gao et al., 2021).
Furthermore, EEG-based monitoring is crucial for developing personalized treatment
plans and improving patient outcomes, making it a vital tool in both clinical and research
settings (Cassani et al., 2018). The significance of EEG in mental health monitoring lies
not only in its diagnostic potential but also in its ability to track struggle to integrate temporal dynamics effectively. Attention
changes over time, offering a dynamic view of the brain’s response mechanisms, particularly spatiotemporal attention models, have
to treatment and environmental factors (Krigolson et al., 2017). As further refined the ability to extract critical features from EEG
mental health issues continue to rise globally, the need for effective data. These methods dynamically assign weights to relevant
monitoring tools like EEG has never been more critical (Goswami time points and spatial regions, enhancing interpretability and
et al., 2022). robustness. When combined with GCNs, attention mechanisms
Traditional machine learning methods in EEG analysis, while provide a powerful framework for modeling the brain’s complex
foundational, exhibit several critical limitations that hinder their and dynamic activity (Tsiouris et al., 2018). Multimodal approaches
effectiveness in mental health monitoring applications. Early incorporating EEG with other physiological signals, such as
methods predominantly relied on handcrafted features extracted electromyography (EMG) and electrocardiography (ECG), have
from EEG signals, such as power spectral density and coherence also shown promise. These methods offer a holistic view of mental
(Kumar and Mittal, 2018). These features, although useful, capture states, combining complementary data to improve classification
only a limited view of the rich information contained within EEG accuracy and resilience against noise. Studies have demonstrated
signals. Specifically, handcrafted features often focus on static, their potential in contexts like stress detection and cognitive load
time-averaged characteristics, neglecting the complex and dynamic estimation, where single-modality approaches may falter (Sturm
temporal dependencies present in EEG data. Such simplifications et al., 2021).
are inadequate for understanding the rapidly fluctuating brain The field has recently advanced toward more sophisticated
activities that are essential for accurate mental health monitoring. models that integrate machine learning techniques with
Moreover, conventional classifiers like support vector machines personalized and remote health monitoring. This latest phase
(SVMs) and k-nearest neighbors (k-NN) often struggle to model has witnessed the adoption of graph-based models, such as Graph
the intricate spatial relationships across multiple EEG channels, Convolutional Networks (GCNs) and Graph Attention Networks
which are crucial for detecting patterns associated with mental (GANs), which are particularly suited for EEG data due to their
health conditions (Delorme and Makeig, 2004). These algorithms ability to model the brain’s complex network structure (Parisot
treat the signals from each electrode independently or rely on et al., 2018). These models capture both localized and global
shallow features that fail to account for inter-channel dependencies. patterns of connectivity across brain regions, creating a nuanced
As a result, they lack the capacity to capture the synchronized understanding of spatial interactions (Song et al., 2021). By
activity patterns across brain regions, which are vital for identifying incorporating temporal dynamics into these graphs, researchers
neural biomarkers related to mood, anxiety, or cognitive states. have developed interpretable, multi-scale models that better
Another significant limitation is the inability of traditional machine support personalized mental health monitoring (Sihag et al., 2022).
learning models to effectively handle the high-dimensional and This phase also emphasizes scalability, allowing EEG-based models
non-linear nature of EEG data. Methods like SVMs and k-NN to be deployed in practical settings. However, issues such as data
typically perform well only in controlled, small-scale settings where privacy, ethical considerations, and continuous improvements
data variability is minimized. When applied to larger, real-world in accessibility remain important challenges (Plis et al., 2018;
EEG datasets, these models tend to exhibit poor generalizability Varatharajan et al., 2022; Stahl et al., 2019).
due to their limited capacity to handle the variability and noise While traditional machine learning methods and early deep
inherent in complex EEG signals (Lotte et al., 2018; Craig and learning models struggle to capture the dynamic, multi-scale
Tran, 2020; Alhussein et al., 2019). This restricts their application to dependencies in EEG data, recent graph-based and attention-
controlled laboratory environments, rendering them less useful for driven approaches have only partially bridged this gap by
real-time, in-the-wild mental health monitoring, where factors such focusing on either spatial or temporal aspects independently.
as individual differences, movement artifacts, and environmental Additionally, these methods often lack scalability and adaptability
noise are present. Traditional machine learning approaches in EEG to personalized, real-world applications, especially in resource-
analysis are constrained by their reliance on handcrafted features, limited settings where data privacy and interpretability are
their inability to capture both spatial and temporal complexities, paramount concerns. Our EEGMind-Transformer model addresses
and their limited adaptability to high-dimensional, noisy datasets. these limitations by integrating advanced graph-based neural
These limitations underscore the need for more advanced methods networks with temporal attention mechanisms, allowing the model
capable of leveraging the full spatio-temporal dynamics of EEG to simultaneously capture intricate spatiotemporal patterns within
signals to enhance the accuracy and robustness of mental health EEG data. This comprehensive approach not only enhances
monitoring in diverse real-world contexts. interpretability by offering insight into specific brain region
Recent innovations have turned to graph-based deep learning interactions relevant to mental health but also improves scalability
methods, particularly Graph Convolutional Networks (GCNs) for remote and clinical applications through a structure that is
(Roy et al., 2019) and Graph Attention Networks (GANs) (Kwon adaptable across different datasets and user scenarios. By effectively
et al., 2019), to address the unique structural properties of EEG bridging the gaps in current methods, the EEGMind-Transformer
data. GCNs are highly effective in representing the brain as provides a robust, scalable, and interpretable solution tailored for
a graph of interconnected regions, allowing models to capture personalized and continuous mental health monitoring.
both localized and global patterns of neural activity (Zhao et al.,
2019). For instance, Craik et al. (2019) demonstrated the utility of • The EEGMind-Transformer introduces Dynamic Temporal
GCNs in brain network analysis, highlighting their capability to Graph Attention Mechanism (DT-GAM), Hierarchical Graph
model hierarchical dependencies. However, these methods often Representation and Analysis (HGRA), and Spatial-Temporal
Fusion Module (STFM) to effectively capture complex features simultaneously (Lakshminarayanan et al., 2023). While
spatiotemporal dependencies in EEG data. these methods have improved performance over traditional
• This method is highly versatile and efficient, suitable machine learning techniques, they still face challenges. One
for various scenarios, consistently delivering excellent significant limitation is their inability to fully capture the complex
performance across different mental health monitoring spatiotemporal dependencies present in EEG data. Additionally,
applications while offering model interpretability these models often require large amounts of labeled data for
and scalability. training, which can be difficult to obtain in clinical settings.
• Experimental results demonstrate that EEGMind- Furthermore, despite their complexity, deep learning models can
Transformer significantly outperforms existing state- sometimes act as “black boxes,” offering little interpretability of
of-the-art methods across multiple datasets, achieving how decisions are made, which is a critical requirement in medical
superior performance. applications (Ai et al., 2023).
Traditional machine learning techniques have been extensively Graph-based approaches have gained traction in EEG analysis
used in EEG analysis for mental health monitoring. These methods due to their ability to model the brain’s complex network structure.
typically involve feature extraction followed by classification Graph-based methods are particularly effective in capturing the
using algorithms such as Support Vector Machines (SVM), k- spatial dependencies and interactions within the brain, which are
Nearest Neighbors (k-NN), Random Forests, and shallow neural often overlooked by traditional machine learning and even some
networks (Lakshminarayanan et al., 2023). Feature extraction deep learning methods. Techniques such as Graph Convolutional
often relies on domain expertise to identify relevant features Networks (GCNs) (Wu et al., 2019) and Graph Attention Networks
from the EEG signals, such as power spectral density, coherence, (GANs) have been applied to EEG data, enabling the capture of
and wavelet coefficients, which are then fed into classifiers to both local and global patterns of brain connectivity. These methods
distinguish between different mental states. While these approaches can dynamically model how different brain regions interact over
have shown some success, they are limited by their reliance on time, providing a more nuanced understanding of the underlying
handcrafted features, which may not capture the full complexity of neural mechanisms associated with mental health conditions
the EEG data (Hong et al., 2024). Moreover, traditional classifiers (Kosaraju et al., 2019). One of the significant advantages of graph-
often struggle with the high dimensionality and variability of EEG based methods is their interpretability, as they can highlight
signals, leading to issues with overfitting and poor generalization specific brain regions or connections that are most relevant to
across different populations and recording conditions (Wan the task at hand. However, challenges remain, particularly in
et al., 2023). Furthermore, these methods are typically static integrating temporal information with the spatial graph structures,
and cannot adequately model the temporal dynamics inherent in as traditional graph-based methods primarily focus on static
EEG signals, which are crucial for understanding cognitive and representations (Wu et al., 2022). Recent advances have started
emotional processes. As a result, while traditional machine learning to address this by incorporating temporal dynamics into graph
approaches have laid the groundwork for EEG analysis, they are models, but there is still much work to be done to fully realize
often insufficient for capturing the complex, non-linear patterns in the potential of graph-based approaches in EEG analysis (He et al.,
the data that are essential for accurate mental health monitoring 2023). These methods represent a promising direction for future
(LaRocco et al., 2023). research, particularly in their ability to provide both high accuracy
and interpretability in mental health monitoring applications.
K being the total number of classes. The goal is to learn a function The training objective is to minimize the cross-entropy loss
that maps EEG data to the correct mental health condition. between the predicted labels ŷi and the true labels yi :
N K
f : RC×T → {1, . . . , K} (1) 1 XX
L(θ ) = − yi,k log(ŷi,k ) (6)
N
i=1 k=1
To facilitate the learning process, the EEG data Xi is first
preprocessed to remove noise and artifacts, resulting in a clean where θ represents all the trainable parameters in the model,
signal X̃i . This preprocessing step includes operations such as and yi,k is a binary indicator (0 or 1) that indicates whether
band-pass filtering, Independent Component Analysis (ICA) for the class label k is the correct classification for sample i. This
artifact removal, and normalization. The preprocessed signal X̃i formalization sets the stage for the detailed exploration of the
is then segmented into overlapping windows of fixed size, each EEGMind-Transformer architecture and its components in the
corresponding to a smaller time frame of the EEG recording. following sections.
j
X̃i ∈ RC×W (2)
4 Methodology
denote the j-th window of the i-th EEG recording, where W is
the window size. 4.1 Overview
j
The EEGMind-Transformer processes each window X̃i
independently through a series of transformations designed to The EEGMind-Transformer introduces a breakthrough in
capture the spatial and temporal dependencies within the EEG mental health monitoring by integrating EEG signals with a
data. The model leverages a spatio-temporal attention mechanism, Transformer-based framework. This model is engineered to exploit
which can be mathematically represented as: both temporal and spatial characteristics of the data, which
are closely linked to mental health issues like depression and
j j
!
j (Qi )(Ki )⊤ j
anxiety. Building on the latest progress in multimodal spatio-
Ai = softmax p Vi (3) temporal attention mechanisms and the evolution of graph-
dk
based deep learning models for mental health assessment, the
j j j EEGMind-Transformer seeks to overcome the constraints of
Here, Qi , Ki , Vi are the query, key, and value matrices obtained
j traditional approaches. It offers a more adaptable, interpretable,
from the linear transformation of the input window X̃i , and dk is the and scalable solution that works efficiently in real-time settings,
dimensionality of the keys. The attention mechanism computes the thus making it suitable for both clinical and practical use
j
weighted sum of the values Vi , where the weights are determined cases. Designed to tackle the inherent challenges posed by
by the similarity between the queries and keys. EEG signal variability and the growing need for individualized
Subsequently, the outputs of the attention mechanism for models, this Transformer-based approach excels in capturing
all windows of the EEG recording are aggregated to form a intricate patterns and long-range dependencies in data. Through
comprehensive representation of the entire EEG signal. This the use of spatio-temporal attention, the model prioritizes the
representation is then passed through a graph neural network most critical features during training. The integration of graph
(GNN) that models the relationships between different brain neural networks allows for deeper insights into inter-regional
regions. The GNN is defined on a graph G = (V, E), where V brain activity, contributing to enhanced inference precision.
represents the set of brain regions (nodes), and E represents the This innovation is poised to significantly impact the future of
connections (edges) between these regions. The graph convolution mental health monitoring by offering a non-invasive, reliable,
operation at each layer of the GNN can be expressed as: and versatile method for early detection and continuous mental
1 1
health evaluation.
H(l+1) = σ D− 2 AD− 2 H(l) W(l) (4) Dynamic Temporal Graph Attention Mechanism (DT-GAM):
We designed the DT-GAM module to capture dynamic temporal
where H(l) denotes the node features at the l-th layer, A is dependencies in EEG data. Unlike traditional temporal modeling
the adjacency matrix of the graph, D is the degree matrix, W(l) methods, DT-GAM uses a graph attention mechanism to adaptively
is the trainable weight matrix, and σ is a non-linear activation adjust relationships between temporal nodes, ensuring that key
function. The final output of the GNN represents the spatial features within specific time intervals receive prioritized attention.
dependencies between different brain regions and is concatenated This design enhances the model’s ability to capture temporal
with the temporal features extracted by the Transformer. information, improving accuracy in predicting different mental
Finally, the concatenated features are fed into a fully connected health states. Hierarchical Graph Representation and Analysis
layer followed by a softmax function to produce the probability Module (HGRA): The proposed HGRA module constructs a
distribution over the mental health condition classes: multi-level graph structure to better simulate the complex
interactions between different brain regions. By aggregating
ŷi = softmax(Wf hi + bf ) (5) information across different hierarchical levels, HGRA captures
both local and global spatial dependencies. This innovation
where Wf and bf are the weights and biases of the fully not only enhances the model’s capacity to interpret brain
connected layer, and hi is the concatenated feature vector. structures but also provides greater interpretability, making it
FIGURE 1
This figure illustrates the architecture of the EEGMind-Transformer model. It includes three core modules: Dynamic Temporal Graph Attention
Mechanism (DT-GAM), Hierarchical Graph Representation and Analysis (HGRA), and Spatiotemporal Fusion Module (STFM). The DT-GAM module
utilizes a graph-based attention mechanism to capture the underlying temporal dependencies in the EEG signals, emphasizing important temporal
features. The Joint Temporal Feature Module (JTFM) further processes this temporal information to enhance the integration of joint temporal
features, which are then passed to subsequent modules. The HGRA module builds a multi-level graph structure to capture local and global spatial
dependencies, providing insights into complex cross-regional brain activities. Finally, the STFM combines the processed temporal and spatial features
from DT-GAM, JTFM, and HGRA to obtain a comprehensive EEG signal representation.
4.2 Dynamic temporal graph attention where At represents the temporal attention scores that adjust
mechanism (DT-GAM) the influence of each time step based on its relevance to the task.
These scores modulate the interaction between nodes (time steps),
The Dynamic Temporal Graph Attention Mechanism (DT- allowing the model to adaptively focus on the most informative
GAM) lies at the heart of the EEGMind-Transformer, enabling segments of the EEG data.
the model to effectively capture complex temporal dependencies The output from the temporal attention mechanism is then
within EEG data. This mechanism is crucial for enhancing the processed through a temporal graph convolutional layer, which
model’s ability to focus on the most relevant temporal features, refines the temporal node embeddings by aggregating information
which are vital for accurate mental health monitoring. DT-GAM from the relevant time steps:
leverages a graph-based representation of EEG data over time,
where each node in the graph represents an EEG channel, and edges H(l+1)
t = σ At H(l)
t Wt
(l)
(9)
capture the temporal relationships between these channels across
different time steps. The dynamic nature of this mechanism allows Here, H(l) (l)
t denotes the node embeddings at layer l, and Wt are
the graph to adapt its structure based on the evolving temporal the learnable weights of the temporal graph convolution layer. This
patterns, ensuring that the most critical time points are given operation is iterated across multiple layers, enabling the model to
priority. The temporal attention mechanism can be mathematically capture higher-order temporal interactions in the EEG data.
The refined temporal features from DT-GAM are then To integrate information across different levels of the hierarchy,
integrated with spatial features using a fusion strategy that the HGRA module employs a pooling mechanism that consolidates
concatenates the temporal and spatial outputs, which are then the node embeddings from lower levels and passes them to higher
processed through a fully connected layer: levels. This pooling operation can be formalized as:
(Kl )
hf = ReLU Wf [ht ; hs ] + bf (10) H(0)
l+1 = Pool Hl (13)
FIGURE 2
Schematic diagram of the Hierarchical Graph Representation and Analysis (HGRA) module. This module is used to capture multi-scale information in
a hierarchical graph structure. The module contains Cell GNN (Cell Graph Neural Network) and Tissue GNN (Tissue Graph Neural Network), which
apply PNA layer (Principal Neighborhood Aggregation) to aggregate node information at the cell and tissue levels, respectively. Through inter-layer
aggregation (Acg→TG ), the information in the cell graph is transferred to the tissue graph, realizing multi-level dependencies from local to global.
where Wo and bo are the weight matrix and bias vector of the
output layer, respectively.
This hierarchical approach not only enhances the model’s
ability to capture the intricate relationships within EEG data but
also provides a structured representation that aligns with the
known hierarchical organization of the brain. By integrating this
knowledge into the EEGMind-Transformer, the model is better
equipped to differentiate between various mental health conditions,
making it a powerful tool for both clinical and real-world mental
health monitoring applications.
The Hierarchical Graph Representation and Analysis module,
with the inclusion of the PAD layer, ensures that the EEGMind-
Transformer can effectively leverage multi-scale information,
which is crucial for capturing the complex, distributed nature
of brain activity. This integration of prior knowledge through
FIGURE 3
a structured graph-based approach represents a significant The Principal Aggregation and Distribution (PAD) Layer in the HGRA
advancement in the field of mental health monitoring using module aggregates embeddings across all hierarchical levels to
EEG data. create a global representation, Hglobal , which is then distributed back
to each level. This enhances local embeddings with global context,
The DT-GAM enhances the model’s interpretability by improving the model’s ability to capture both global and local
dynamically assigning attention to specific temporal segments dependencies in EEG data. Aggregation is performed using various
within the EEG data. This mechanism allows the model to methods (e.g., sum, mean, or attention), and the enhanced
embeddings Henhanced at each level incorporate this global
prioritize and highlight critical temporal events, such as shifts l
information, allowing for a multi-scale representation.
in brainwave patterns associated with cognitive or emotional
changes. By identifying these key temporal dependencies, DT-
GAM provides insights into the temporal dynamics that may
correlate with specific mental health states, offering clinicians regions or connections are most involved in certain mental health
an understanding of which periods in the EEG signals are conditions, providing valuable information for both research and
most indicative of mental health conditions. Similarly, the clinical applications. In terms of scalability, both DT-GAM and
HGRA module contributes to interpretability by modeling the HGRA are designed to adapt to various EEG datasets and clinical
hierarchical structure of brain regions. This approach enables settings. The modularity of EEGMind-Transformer allows for
the model to reveal important spatial interactions among brain adjustments to be made easily to the attention mechanisms and
regions, capturing both localized and global dependencies. HGRA’s graph layers, facilitating its application across diverse patient
hierarchical graph-based approach can help identify which brain populations and EEG recording setups. This flexibility, combined
with the interpretability offered by DT-GAM and HGRA, positions Theoretical analysis demonstrates that the HGRA module
the EEGMind-Transformer as a robust and scalable model for effectively reproduces dynamic interactions observed in biological
widespread use in mental health monitoring. We appreciate the brain networks, particularly in terms of multi-scale interaction
reviewer’s suggestion, which has enriched our discussion on the patterns and modular structures. By leveraging the graph similarity
practical and clinical utility of the model. index S and modularity metric Q, the alignment between artificial
The theoretical correlation between the HGRA module and and biological neural networks is quantitatively verified. This
biological brain networks is explored based on the dynamics framework enhances the biological interpretability of the HGRA
of neural networks and regional interactions. The functional module, providing a robust foundation for understanding its
interactions of biological brain networks can be represented by a relevance to brain-inspired computational principles.
functional connectivity matrix, defined as:
PhyAAt Athletes; unspecified gender and age Stress tests EE and Physio. data under stress for Perf.
eSports sensors Pro gamers; demographics not specified Competitive tasks Real-time EE and biometric data for Perf.
DEAP Mixed demographics Music video viewing Benchmark for Emo. processing
EE, electroencephalography; Cog., cognitive; Emo., emotional; Physio., physiological; Perf., performance.
are influenced by spatial structures in the brain. This integrated viewing of music videos. This dataset is instrumental in studying
approach is crucial for capturing the complex and non-linear the neural basis of emotional processing and has been widely
interactions that underlie mental health conditions. Finally, the used to benchmark models in the field of affective state analysis.
fused feature vector hf is passed through a softmax layer to produce Together, these datasets provide a diverse and challenging set
the final output: of scenarios for evaluating the EEGMind-Transformer’s ability to
generalize across different mental states, activities, and subject
populations. We have provided a table detailing the demographic
ŷi = softmax(Wo hf + bo ) (25) characteristics of subjects within each dataset, including age ranges,
gender distributions, and any other relevant details available.
where Wo and bo are the weights and biases of the output
Table 1 clarifies the demographic composition of each dataset,
layer, and ŷi is the probability distribution over the mental health
helping to assess potential biases and the model’s applicability
condition classes. The STFM thus plays a pivotal role in the
across different populations.
EEGMind-Transformer by ensuring that the model effectively
For preprocessing, all EEG signals were band-pass filtered
leverages both spatial and temporal information. This fusion
between 0.5 and 50 Hz to retain relevant neural activity
not only enhances the accuracy of the model’s predictions but
while removing low-frequency drifts and high-frequency noise.
also improves its interpretability by providing insights into how
The band-pass filter was implemented using a fourth-order
different brain regions interact over time to influence mental
Butterworth filter, which provides an optimal balance between
health. The use of the STFM makes the EEGMind-Transformer
sharp cutoffs and minimal phase distortion. To address common
particularly well-suited for tasks that require an understanding
EEG artifacts, we employed Independent Component Analysis
of the complex, dynamic processes underlying cognitive and
(ICA) for artifact removal. Components corresponding to eye
emotional states.
blinks, muscle artifacts, and power line noise (50 Hz) were
identified manually based on their time series, frequency spectra,
5 Experiment and spatial distributions. These components were excluded before
reconstructing the cleaned EEG signals. For additional robustness,
5.1 Datasets channels with consistently high noise levels were interpolated using
neighboring channels if their signal-to-noise ratio (SNR) fell below
In this study, we comprehensively evaluate the performance a threshold of 20 dB. EEG signal segmentation was performed based
of the EEGMind-Transformer using four distinct datasets that on experimental protocols specific to each dataset. For example,
represent a broad spectrum of mental states and cognitive activities. in the DEAP dataset, each trial was segmented into 60-s windows
The first dataset, EEGEyeNet, is an extensive collection of EEG corresponding to affective state ratings. Overlapping windows of
recordings captured during a series of visual tasks designed to 5 s with a step size of 2 s were used for temporal resolution
probe the intricate connections between eye movements and in dynamic analysis. Similarly, in the EEGEyeNet and PhyAAt
underlying cognitive processes. This dataset is particularly valuable datasets, segmentation was aligned with task events (e.g., stimulus
for understanding how visual stimuli are processed in the brain onset), with a window length of 4 s post-stimulus to capture event-
and how these processes are reflected in EEG signals. The second related dynamics. These preprocessing steps ensure high-quality
dataset, PhyAAt, focuses on the physiological responses of athletes EEG data for analysis while minimizing noise and preserving
during both physical and mental stress tests. It includes EEG data key neural features. We have incorporated these details into the
alongside other physiological signals, providing a holistic view of revised manuscript to enhance the clarity and reproducibility of our
the neural and bodily responses to stress, which is crucial for methods. If additional clarification or adjustments are required, we
studying the neural correlates of performance under pressure. are happy to provide further details.
The eSports Sensors dataset is another critical resource, capturing The datasets differ significantly in terms of task focus,
EEG and other biometric data from professional gamers in highly demographic composition, and data variability. For instance,
competitive scenarios. This dataset offers unique insights into DEAP focuses on affective state analysis during music video
the mental states associated with high-intensity decision-making viewing, featuring relatively low inter-subject variability in a
and stress in real-time, which are essential for understanding the controlled setting. In contrast, EEGEyeNet involves visual tasks
neural dynamics of peak performance. Lastly, the DEAP dataset is that probe cognitive processes with diverse spatial-temporal
a well-established benchmark in affective computing, comprising patterns, presenting a broader spectrum of brain activity.
EEG recordings alongside self-reported emotional states during the PhyAAt captures data during stress-inducing tasks performed
Data: EEGEyeNet Dataset, PhyAAt Dataset, eSports or data distributions, such as DEAP and EEGEyeNet. However,
Sensors Dataset, DEAP Dataset performance dropped slightly when transitioning to datasets with
Input: Training data X, Labels Y, Learning rate α, higher variability or differing neural patterns, such as from DEAP
Batch size B, Number of epochs E, Learning to eSports Sensors. This highlights the sensitivity of the model to
rate decay β task-specific features and environmental conditions. To mitigate
Output: Trained Model M, Evaluation metrics: these effects and enhance generalization, we employed data
Recall, Precision, F1-score augmentation techniques, including random cropping, Gaussian
Initialize model parameters θ randomly; noise injection, and temporal jittering, during training. These
Set initial learning rate α = 1 × 10−4 ; strategies improved cross-dataset robustness by encouraging the
Set decay factor β = 0.1; model to learn invariant features. Additionally, we analyzed the
Set dropout rate p = 0.3; model’s attention maps across datasets to understand how it
Set max epochs E = 50, early stopping threshold adapts to varying data characteristics, finding that the Dynamic
S = 5; Temporal Graph Attention Mechanism effectively adjusts to
foreach dataset D ∈ {EEGEyeNet, PhyAAt, eSports diverse temporal dependencies.
Sensors, DEAP} do
Split D into training, validation, and test
sets (80-10-10 split);
for epoch = 1 to E do 5.2 Experimental details
for each batch (Xb , Yb ) of size B from training
set do The experimental setup for evaluating the
Forward pass through EEGMind-Trans Net: ; EEGMind-Transformer was meticulously designed to ensure
Compute output Ŷb = fθ (Xb ); the accuracy, reliability, and generalizability of the results. Each
1 PB
Compute loss L(θ) = B i=1 L(Ybi , Ŷbi ); dataset was carefully partitioned into training, validation, and test
Backpropagate to compute gradients sets with an 80/10/10 split, ensuring that each set was representative
∇θ L(θ); of the overall data distribution. This stratified splitting method
Update parameters: θ = θ − α∇θ L(θ); was crucial to maintain a balanced distribution of classes across
end all subsets, reducing the risk of biased training or evaluation
if no improvement in validation loss for S results. The EEGMind-Transformer model was implemented
epochs then using the PyTorch deep learning framework, which provided
Apply learning rate decay: α = α × β; a flexible and powerful environment for model development
end and experimentation. All experiments were conducted on a
if validation loss plateaus then high-performance computing system equipped with NVIDIA
Stop training early; Tesla V100 GPUs, which allowed for efficient processing of the
end high-dimensional EEG data. The model training process begins
end with initializing model parameters to ensure stable convergence.
end Hyperparameter optimization was carried out using the validation
while improving do set to maximize the model’s effectiveness, with an initial learning
Randomly augment data with temporal rate set to 1 × 10−4 and a batch size of 64, carefully chosen
transformations (random cropping, flipping, to balance convergence speed and computational load. The
etc.); learning rate was dynamically adjusted through a cosine annealing
Resize input frames to 224 × 224 pixels; schedule with warm restarts, periodically resetting to a higher
Train the model on augmented data; rate, which helped the model avoid local minima and support
end global optimization. Training spanned 1,000 epochs, with an
early stopping mechanism that halted the process if there was
no reduction in validation loss for 10 consecutive epochs, thus
Algorithm 1. Training process for EEGMind-Trans Net on various
datasets. minimizing the risk of overfitting. To further ensure robustness
and generalizability, a five-fold cross-validation strategy was
employed. The data was divided into five subsets, with four subsets
used for training and the fifth for validation, repeating this process
by athletes, introducing variability in physiological responses for each fold. This approach provided a reliable estimate of the
under physical exertion. eSports Sensors focuses on high-intensity model’s generalization ability across different data segments. The
decision-making in competitive gaming scenarios, characterized training process also incorporated data augmentation strategies to
by real-time neural dynamics and increased noise levels due improve model resilience on unseen datasets, including random
to movement artifacts. These differences inherently affect the cropping of EEG signals, Gaussian noise injection to simulate
model’s generalization ability. To quantify this, we evaluated cross- real-world disturbances, and time-warping to account for timing
dataset performance, where the model trained on one dataset was variations in neural activity. These augmentations were critical for
tested on another. The results showed that the model achieved enhancing the model’s performance on diverse data conditions.
high accuracy when datasets shared similar task characteristics This experimental framework was designed to rigorously evaluate
the EEGMind-Transformer’s performance, ensuring that the and training time. The reduction in parameters and FLOPs
results were robust, reproducible, and applicable to real-world indicates that our model is not only less complex but also more
contexts. Algorithm 1 outlines the detailed training process, computationally efficient. This efficiency is largely due to the
capturing each step in the model’s preparation for mental health innovative use of the Spatial-Temporal Fusion Module (STFM),
monitoring applications. which optimally integrates spatial and temporal features while
In our five-fold cross-validation setup, the dataset was reducing computational overhead. Additionally, the model’s faster
randomly partitioned into five equal subsets. Each subset was used inference time and reduced training time make it particularly
as a validation set once, while the remaining four subsets served suitable for real-time applications in environments like eSports and
as the training set. This process was repeated five times, ensuring emotional state monitoring, where quick and accurate predictions
that every sample in the dataset was included in the validation set are critical. The overall results confirm that the EEGMind-
exactly once. The performance metrics, including accuracy, recall, Transformer is the most efficient and effective model for these
F1-score, and AUC, were averaged across the five folds to provide a tasks, offering the best trade-off between computational cost and
robust assessment of the model’s generalization ability. To ensure performance (Table 3 and Figure 5).
balanced data distribution across the folds, stratified sampling The ablation study conducted on the EEGEyeNet and PhyAAt
was employed. This maintained the same proportion of classes datasets investigates the impact of three key components of the
in each fold as in the original dataset, preventing any skewness EEGMind-Transformer: the Dynamic Temporal Graph Attention
in the validation results. The size of the subsets varied slightly Mechanism (DT-GAM), the Hierarchical Graph Representation
depending on the dataset. For example, in the EEGEyeNet dataset, and Analysis (HGRA) module, and the Spatial-Temporal Fusion
with approximately 10,000 samples, each fold contained around Module (STFM). The metrics considered are Parameters, FLOPs,
2,000 samples. Similarly, for the PhyAAt dataset, which has about Inference Time, and Training Time, which provide insights
5,000 samples, each fold comprised approximately 1,000 samples. into the model’s efficiency and complexity. Removing the DT-
GAM significantly increases the FLOPs and inference time,
indicating that this module is critical for efficiently capturing
temporal dependencies in the EEG data. The removal of
5.3 Experimental results and analysis HGRA has a pronounced effect on the model’s parameter
count and inference time, demonstrating that hierarchical spatial
In this experiment, we evaluate the performance of the representation is essential for maintaining model complexity and
EEGMind-Transformer against six state-of-the-art (SOTA) models: performance. Without the STFM, there is a marked increase
DeepConvNet, EEGNet, LSTM-FCN, SVM-RBF, Random Forest, in both training time and FLOPs, suggesting that this module
and CNN-LSTM on two challenging datasets: EEGEyeNet and plays a crucial role in reducing computational overhead while
PhyAAt. The comparison focuses on four critical metrics. Accuracy effectively integrating spatial and temporal features. Among
measures the overall correctness of the model’s predictions. Recall these components, the HGRA appears to be the most critical,
evaluates the model’s ability to identify all relevant instances, while as its removal results in the most significant degradation in
F1 Score balances precision and recall, providing a single metric performance across all metrics, highlighting its importance in the
for model performance. The EEGMind-Transformer outperforms model’s architecture. This analysis underscores that each module
all the SOTA models across these metrics, demonstrating superior contributes uniquely to the EEGMind-Transformer’s efficiency and
performance in both datasets. This success can be attributed to its effectiveness, with the HGRA being particularly vital for its overall
innovative use of Dynamic Temporal Graph Attention Mechanism performance (Table 4 and Figure 6).
(DT-GAM) and Hierarchical Graph Representation and Analysis The ablation study on the eSports Sensors and DEAP datasets
(HGRA) modules, which effectively capture the complex temporal explores the effect of removing three key modules from the
and spatial dependencies in EEG data. The results show that our EEGMind-Transformer: the Dynamic Temporal Graph Attention
model achieves the highest accuracy and F1 Score, indicating its Mechanism (DT-GAM), the Hierarchical Graph Representation
robustness in classifying mental states, while also providing the and Analysis (HGRA) module, and the Spatial-Temporal Fusion
best AUC, showcasing its excellent discriminatory power. The Module (STFM). The results reveal that removing the DT-GAM
significant improvement in recall highlights our model’s capability leads to a notable decrease in accuracy and recall, highlighting
to detect subtle EEG patterns associated with different cognitive its importance in accurately capturing the temporal aspects of
states. These results confirm that the EEGMind-Transformer is the EEG signals. The HGRA module is even more crucial, as
the most effective model for EEG-based mental state classification its removal results in the most significant drop in F1 Score and
tasks, making it particularly well-suited for applications in AUC, indicating that the model struggles to maintain a high level
mental health monitoring and cognitive assessment (Table 2 of performance without the hierarchical representation of spatial
and Figure 4). features. This module is essential for understanding the complex
This experiment compares the EEGMind-Transformer with interactions between different brain regions, which is vital for
six SOTA models, including DeepConvNet, EEGNet, LSTM- accurately classifying mental states. The removal of the STFM also
FCN, SVM-RBF, Random Forest, and CNN-LSTM, using the impacts performance, particularly in terms of F1 Score and AUC,
eSports Sensors and DEAP datasets. The comparison is based but to a lesser extent than the HGRA. This suggests that while the
on four computational metrics. The EEGMind-Transformer STFM is important for efficiently combining spatial and temporal
exhibits superior computational efficiency, outperforming all features, the HGRA plays a more foundational role in the model’s
other models in terms of parameters, FLOPs, inference time, success. Overall, this analysis confirms that the HGRA is the most
TABLE 2 The results of three separate five-fold cross-validations conducted on the EEGEyeNet and PhyAAt datasets.
EEGNet (Lawhern et al., 87.17 ± 0.03 88.38 ± 0.03 86.08 ± 0.03 87.02 ± 0.03 92.93 ± 0.03 87.87 ± 0.03 86.77 ± 0.03 90.92 ± 0.03
2018)
LSTM-FCN (Karim et al., 88.14 ± 0.03 84.78 ± 0.03 91.01 ± 0.03 92.23 ± 0.03 88.23 ± 0.03 90.82 ± 0.03 91.06 ± 0.03 88.54 ± 0.03
2018)
SVM-RBF (Guo et al., 2019) 87.45 ± 0.03 85.00 ± 0.03 86.83 ± 0.03 91.19 ± 0.03 89.26 ± 0.03 84.48 ± 0.03 85.31 ± 0.03 93.3 ± 0.03
Random Forest (Liaw and 93.36 ± 0.03 90.6 ± 0.03 90.81 ± 0.03 92.48 ± 0.03 92.16 ± 0.03 88.22 ± 0.03 89.86 ± 0.03 89.1 ± 0.03
Wiener, 2002)
CNN-LSTM (Li et al., 2020) 89.63 ± 0.03 89.56 ± 0.03 90.87 ± 0.03 91.97 ± 0.03 95.68 ± 0.03 90.11 ± 0.03 90.47 ± 0.03 90.65 ± 0.03
EEGMind-transformer (ours) 97.73 ± 0.03 94.69 ± 0.03 94.17 ± 0.03 95.6 ± 0.03 98.33 ± 0.03 95.18 ± 0.03 94.22 ± 0.03 96.23 ± 0.03
Values are reported in the format “mean ± standard deviation.” Bold scores indicate that our method performed significantly better on that metric compared to other methods, as determined
by a Student’s t-test with a significance level of 0.05.
FIGURE 4
This figure compares the performance of EEGMind-Transformer against six state-of-the-art models on the EEGEyeNet and PhyAAt datasets, showing
superior results in Accuracy, Recall, F1 Score, and AUC due to its advanced DT-GAM and HGRA modules.
TABLE 3 The results of three separate five-fold cross-validations conducted on the eSports Sensors and DEAP datasets.
EEGNet 208.72 ± 0.03 399.45 ± 0.03 290.42 ± 0.03 233.14 ± 0.03 241.71 ± 0.03 295.60 ± 0.03 319.41 ± 0.03 399.43 ± 0.03
LSTM-FCN 319.14 ± 0.03 260.01 ± 0.03 244.28 ± 0.03 293.71 ± 0.03 371.84 ± 0.03 201.22 ± 0.03 277.02 ± 0.03 358.85 ± 0.03
SVM-RBF 212.12 ± 0.03 213.15 ± 0.03 293.59 ± 0.03 237.10 ± 0.03 397.94 ± 0.03 283.10 ± 0.03 300.97 ± 0.03 297.18 ± 0.03
Random Forest 319.62 ± 0.03 348.84 ± 0.03 277.08 ± 0.03 389.84 ± 0.03 372.75 ± 0.03 252.05 ± 0.03 205.82 ± 0.03 217.83 ± 0.03
CNN-LSTM 392.29 ± 0.03 353.21 ± 0.03 296.46 ± 0.03 356.24 ± 0.03 347.09 ± 0.03 202.21 ± 0.03 331.65 ± 0.03 303.49 ± 0.03
EEGMind- 171.27 ± 0.03 111.15 ± 0.03 214.94 ± 0.03 164.84 ± 0.03 174.25 ± 0.03 165.09 ± 0.03 190.29 ± 0.03 193.49 ± 0.03
transformer
(ours)
Values are reported in the format “mean ± standard deviation.” Bold scores indicate that our method performed significantly better on that metric compared to other methods, as determined
by a Student’s t-test with a significance level of 0.05.
FIGURE 5
This figure compares EEGMind-Transformer’s computational efficiency with six SOTA models on eSports Sensors and DEAP datasets, showing
superior results in parameters, FLOPs, inference time, and training time due to its STFM module.
critical module, with its presence being indispensable for achieving The EEGMind-Transformer’s architecture, with components
the highest levels of accuracy and discriminative power in the like the Dynamic Temporal Graph Attention Mechanism
EEGMind-Transformer (Table 5 and Figure 7). (DT-GAM) and Hierarchical Graph Representation and
TABLE 4 The results of an ablation study conducted on the EEGEyeNet and PhyAAt datasets.
w/o HGRA 289.19 ± 0.03 291.58 ± 0.03 217.44 ± 0.03 239.72 ± 0.03 304.82 ± 0.03 270.70 ± 0.03 262.51 ± 0.03 313.68 ± 0.03
w/o STFM 236.15 ± 0.03 280.91 ± 0.03 274.23 ± 0.03 333.82 ± 0.03 252.28 ± 0.03 333.60 ± 0.03 395.02 ± 0.03 250.05 ± 0.03
Full model 105.47 ± 0.03 125.70 ± 0.03 188.61 ± 0.03 188.65 ± 0.03 151.72 ± 0.03 104.67 ± 0.03 186.59 ± 0.03 200.12 ± 0.03
Values are reported in the format “mean ± standard deviation.” Bold scores indicate that our method, with specific components removed, performed significantly better on that metric compared
to other variations, as determined by a Student’s t-test with a significance level of 0.05.
FIGURE 6
This figure shows an ablation study on EEGEyeNet and PhyAAt datasets, assessing the impact of DT-GAM, HGRA, and STFM on model efficiency.
HGRA is identified as the most critical component.
Analysis (HGRA), not only enhances accuracy but also provides In these experiments, the EEGMind-Transformer showed high
interpretable insights into EEG patterns that are crucial for accuracy in classifying various mental health conditions, achieving
clinical decision-making. This interpretability is instrumental for results that closely aligned with clinicians’ assessments. These
gaining clinical acceptance, as it allows healthcare providers to findings highlight the model’s potential as a reliable tool for
understand the model’s focus on specific EEG features linked to early detection, continuous monitoring, and personalized mental
mental health conditions. Furthermore, the model’s scalability health care, underscoring its feasibility and utility in practical
and relatively low computational demands demonstrate its clinical settings. This assessment supports the model’s capability
adaptability for both on-site and remote health monitoring, to meet the demands of modern clinical applications in mental
which is particularly advantageous in clinical settings with limited health monitoring.
computational resources. To test its real-world applicability, To further validate the EEGMind-Transformer model, we
we conducted preliminary clinical experiments using a cohort conducted an analysis of the physiological phenomena associated
of patients undergoing EEG-based mental health assessments. with the extracted EEG features and their alignment with known
TABLE 5 The results of an ablation study conducted on the eSports Sensors and DEAP datasets.
w/o HGRA 91.14 ± 0.03 91.54 ± 0.03 88.41 ± 0.03 88.13 ± 0.03 91.45 ± 0.03 84.88 ± 0.03 84.56 ± 0.03 88.34 ± 0.03
w/o STFM 92.96 ± 0.03 90.88 ± 0.03 88.13 ± 0.03 88.32 ± 0.03 90.43 ± 0.03 85.62 ± 0.03 85.15 ± 0.03 90.31 ± 0.03
Full model 98.13 ± 0.03 95.17 ± 0.03 93.71 ± 0.03 93.32 ± 0.03 97.95 ± 0.03 95.13 ± 0.03 91.37 ± 0.03 94.06 ± 0.03
Values are reported in the format “mean ± standard deviation.” Bold scores indicate that our method, with specific components removed, performed significantly better on that metric compared
to other variations, as determined by a Student’s t-test with a significance level of 0.05.
FIGURE 7
This figure shows an ablation study on eSports Sensors and DEAP datasets, assessing the impact of DT-GAM, HGRA, and STFM. HGRA is found to be
the most essential module.
biomarkers for mental health conditions. The model consistently model’s feature selection indicates its sensitivity to underlying
highlights power spectral density (PSD) features in specific EEG physiological phenomena that are critical to mental health
frequency bands, such as alpha (8–12 Hz) and beta (13–30 assessment. Moreover, the Dynamic Temporal Graph Attention
Hz), which are well-documented indicators of mental states Mechanism (DT-GAM) in the EEGMind-Transformer model
associated with conditions like anxiety and depression. Alpha focuses attention on temporal and spatial interactions primarily
activity, typically linked to relaxation and mental inactivity, within the frontal and parietal regions. These areas of the brain play
often shows altered patterns in individuals experiencing anxiety, significant roles in cognitive functions and emotional regulation,
while beta activity is associated with active concentration and with the frontal lobe involved in executive functions and decision-
emotional processing, which is commonly elevated in stress-related making, and the parietal lobe contributing to sensory integration
conditions. The prominence of these frequency bands in the and attentional processing. This attention to key brain regions
where W̄1 and W̄2 are the mean connectivities, s21 and s22 are
5.3.1 Quantitative analysis of HGRA module variances, and n1 , n2 are sample sizes for the two groups.
outputs The experimental (Table 6) results highlight the efficacy
To validate the biological significance of the Hierarchical Graph of the HGRA module in capturing biologically meaningful
Representation and Analysis (HGRA) module, we conducted brain connectivity patterns and its relevance to understanding
experiments using the DEAP dataset, which includes EEG data mental health conditions. The graph similarity index (S) of 0.85
and affective state labels (e.g., valence, arousal). EEG signals demonstrates a strong alignment between the learned adjacency
were preprocessed with band-pass filtering (0.5–50 Hz) and matrix and functional connectivity derived from EEG signals. This
Independent Component Analysis (ICA) to remove artifacts. indicates that the HGRA module effectively models the underlying
The functional connectivity matrix for each recording was neural interactions, providing a reliable computational framework
computed using Pearson correlation across EEG channels, for representing brain network dynamics. The modularity score
representing known brain region interactions. The HGRA module’s (Q) of 0.42 further emphasizes the module’s ability to capture
learned adjacency matrix was quantitatively evaluated against the hierarchical structures in brain networks, such as the default
functional connectivity matrix using the graph similarity index mode network and frontoparietal network. These structures are
(S) and modularity score (Q). Additionally, a two-sample t-test integral to various cognitive and emotional processes, suggesting
was used to assess connectivity differences between high and that the HGRA module aligns well with established neuroscientific
low valence states, identifying key brain regions with significant frameworks. The ability to replicate such modularity is particularly
alterations. Metrics were averaged across all samples, and results significant for mental health monitoring, as disruptions in these
were benchmarked against neuroscientific findings to validate their networks are often associated with conditions like anxiety,
alignment with known functional networks. depression, and stress disorders. The significant differences in
prefrontal-amygdala connectivity (t = 3.27, p < 0.01) provide
further validation of the HGRA module’s clinical relevance. The
Cij = corr(Ai , Aj ), (26) prefrontal cortex and amygdala are key regions implicated in
emotional regulation and stress response. The observed alterations
where Cij is the correlation between activation patterns Ai and Aj
in connectivity between these regions align with known biomarkers
of brain regions i and j.
of affective states, reinforcing the module’s potential to distinguish
P between different mental health conditions. These findings suggest
i,j Cij · Wij that the HGRA module not only captures neural connectivity with
S = qP qP , (27)
2 · 2 high fidelity but also provides interpretable insights into the neural
i,j Cij i,j Wij
basis of emotional and cognitive states.
where S ∈ [0, 1] measures alignment between the HGRA module’s To assess the robustness of the EEGMind-Transformer under
learned adjacency matrix Wij and the functional connectivity real-world conditions, additional experiments were conducted
matrix Cij . using the EEGEyeNet and PhyAAt datasets to evaluate the model’s
performance in scenarios with noise interference and on simulated
1 X
ki kj
mobile devices. Noise levels were introduced by adding Gaussian
Q= Wij − δ(gi , gj ), (28) noise to the EEG signals, simulating real-world artifacts. The noise
2m 2m
i,j conditions were categorized as low (SNR = 20 dB), medium (SNR
where ki and kj are the degrees of nodes i and j, m is the total = 10 dB), and high (SNR = 5 dB). Additionally, the model was
number of edges, and δ(gi , gj ) indicates whether nodes i and j deployed in a simulated mobile environment using TensorFlow
belong to the same module. Lite to evaluate latency and efficiency. The results (in Table 7)
To identify significant differences in Wij between high and low show that the EEGMind-Transformer maintained high accuracy
valence states, a two-sample t-test was conducted: and F1-scores across all noise conditions. For the EEGEyeNet
dataset, accuracy decreased by <4% under high noise levels,
W̄1 − W̄2 with a similar trend observed for the PhyAAt dataset. These
t= r , (29)
s21 s22
results demonstrate the model’s resilience to noise interference,
n1 + n2 which is attributed to the Dynamic Temporal Graph Attention
TABLE 7 Performance of EEGMind-transformer under noise and mobile scenarios with error ranges.
Low noise (SNR = 20 dB) EEGEyeNet 96.85 (± 0.01–0.03) 93.12 (± 0.01–0.03) 115 (± 0.01–0.03)
Medium noise (SNR = 10 dB) EEGEyeNet 95.62 (± 0.01–0.03) 91.85 (± 0.01–0.03) 120 (± 0.01–0.03)
High noise (SNR = 5 dB) EEGEyeNet 93.41 (± 0.01–0.03) 89.71 (± 0.01–0.03) 125 (± 0.01–0.03)
Low noise (SNR = 20 dB) PhyAAt 97.24 (± 0.01–0.03) 92.80 (± 0.01–0.03) 118 (± 0.01–0.03)
Medium noise (SNR = 10 dB) PhyAAt 95.94 (± 0.01–0.03) 91.33 (± 0.01–0.03) 123 (± 0.01–0.03)
High noise (SNR = 5 dB) PhyAAt 93.57 (± 0.01–0.03) 89.11 (± 0.01–0.03) 130 (± 0.01–0.03)
Mobile deployment (simulated) EEGEyeNet 95.87 (± 0.01–0.03) 92.64 (± 0.01–0.03) 125 (± 0.01–0.03)
Mobile deployment (simulated) PhyAAt 96.15 (± 0.01–0.03) 92.91 (± 0.01–0.03) 128 (± 0.01–0.03)
TABLE 8 Comparison of sensitivity, specificity, precision, and F1-score on EEGEyeNet and PhyAAt datasets.
EEGNet 88.38 ± 0.03 87.12 ± 0.03 86.75 ± 0.03 86.08 ± 0.03 87.87 ± 0.03 89.03 ± 0.03 85.90 ± 0.03 86.77 ± 0.03
LSTM-FCN 84.78 ± 0.03 91.10 ± 0.03 89.55 ± 0.03 91.01 ± 0.03 90.82 ± 0.03 90.17 ± 0.03 91.23 ± 0.03 91.06 ± 0.03
SVM-RBF 85.00 ± 0.03 88.40 ± 0.03 85.70 ± 0.03 86.83 ± 0.03 84.48 ± 0.03 88.75 ± 0.03 85.20 ± 0.03 85.31 ± 0.03
Random forest 90.60 ± 0.03 91.12 ± 0.03 91.25 ± 0.03 90.81 ± 0.03 88.22 ± 0.03 90.11 ± 0.03 90.75 ± 0.03 89.86 ± 0.03
CNN-LSTM 89.56 ± 0.03 91.01 ± 0.03 90.45 ± 0.03 90.87 ± 0.03 90.11 ± 0.03 92.33 ± 0.03 91.56 ± 0.03 90.47 ± 0.03
EEGMind- 94.69 ± 0.03 95.02 ± 0.03 94.80 ± 0.03 94.17 ± 0.03 95.18 ± 0.03 95.10 ± 0.03 94.88 ± 0.03 94.22 ± 0.03
transformer
(Ours)
Bold text is the best value.
FIGURE 8
Spatiotemporal feature heatmaps for high and low arousal conditions learned by the Dynamic Temporal Graph Attention Mechanism (DT-GAM). The
heatmaps highlight the temporal attention weights assigned to different EEG segments, showing the model’s focus on relevant time intervals during
classification.
FIGURE 9
Brain region importance visualization for stress-level classification derived from the Hierarchical Graph Representation and Analysis (HGRA) module.
The prefrontal cortex and amygdala exhibit the highest importance scores, consistent with their known roles in stress and emotional regulation.
with a precision of 94.88% and an F1-score of 94.22%. In contrast, making it suitable for real-world applications in EEG-based
the closest performing model, CNN-LSTM, achieved F1-scores health monitoring.
of 90.87% on EEGEyeNet and 90.47% on PhyAAt, showing Table 9 outlines the correspondence between the mental health
a significant performance gap compared to the EEGMind- classification labels used in this study and the associated EEG
Transformer. Traditional models such as Random Forest and features. The classification labels were derived from validated
SVM-RBF displayed lower sensitivity and precision, highlighting datasets, including DEAP, PhyAAt, EEGEyeNet, and eSports, and
their limitations in handling the spatiotemporal complexity were based on clinically or experimentally relevant criteria. For
of EEG data. These results underscore the effectiveness of the instance, valence and arousal were defined using self-reported
EEGMind-Transformer’s dynamic temporal graph attention scores on a 9-point Likert scale from the DEAP dataset, while
mechanism in capturing subtle but critical features, which stress levels were categorized using heart rate variability (HRV) and
contributes to its superior performance. The high sensitivity self-reported scores from the PhyAAt dataset. Similarly, cognitive
ensures that the model captures most positive samples, while load and task engagement were based on task performance
the high specificity indicates a strong ability to filter out false and subjective ratings from EEGEyeNet and eSports datasets,
positives. The balanced precision and F1-scores demonstrate its respectively. The EEG features associated with these labels reflect
robustness in handling potentially imbalanced data distributions, well-documented neurophysiological patterns. For example, frontal
asymmetry in the alpha band is linked to valence, while arousal human-computer interaction (HCI), the model can facilitate brain-
is associated with increased beta and gamma activity and reduced computer interface (BCI) applications, such as hands-free control
alpha power. Stress levels are characterized by elevated theta and of devices in gaming, assistive technologies for individuals with
alpha activity in the prefrontal cortex (PFC) and altered functional disabilities, or enhanced user experience design in immersive
connectivity between the PFC and the amygdala. Cognitive load environments. Furthermore, the model has practical applications
and task engagement involve distinct patterns of beta and theta in workplace stress management, where it can be used to monitor
activity, particularly in the frontal and parietal regions. These well- operators in high-stress occupations like air traffic control or
established associations provide a neurophysiological basis for the emergency response. By providing real-time feedback and stress
classification process, enhancing the interpretability and clinical mitigation strategies, it supports both performance optimization
relevance of the results. and well-being.
Figure 8 illustrates the spatiotemporal feature heatmaps learned One key limitation of the model lies in its reliance on high-
by the Dynamic Temporal Graph Attention Mechanism (DT- quality, artifact-free EEG data. While the model performs well on
GAM). The heatmaps depict the temporal attention weights preprocessed datasets, its robustness to noise or artifacts in real-
assigned to different time intervals of EEG signals under high world clinical EEG data remains a challenge. Future work could
and low arousal conditions. For the high arousal condition, focus on enhancing the model’s resilience to common EEG artifacts,
the model prioritizes EEG segments corresponding to task such as muscle and movement artifacts, by incorporating data
transitions and heightened neural activity, whereas the low augmentation techniques or developing adaptive filtering layers
arousal condition shows comparatively distributed attention that operate within the model to manage noise. Additionally,
weights. These visualizations demonstrate the model’s capability while our preliminary results demonstrate the model’s effectiveness,
to dynamically focus on the most relevant temporal features for long-term stability across diverse patient populations and mental
classification. Figure 9 visualizes the importance scores of different health conditions has yet to be fully validated. Further testing
brain regions as derived from the adjacency matrices learned is necessary to confirm the model’s stability and reliability
by the Hierarchical Graph Representation and Analysis (HGRA) over extended monitoring periods. Another area for future
module. In the stress-level classification task, the prefrontal cortex improvement involves validating the model’s generalizability across
and amygdala exhibit the highest importance scores, highlighting different environments, particularly for remote or mobile health
their central roles in stress regulation and emotional processing. applications where EEG data quality and environmental factors
These visualizations provide interpretable evidence that the model’s may vary widely. Conducting experiments under varied conditions-
learned features align with established neuroscientific findings, such as differing ambient noise levels, electrode types, and user
emphasizing its utility in mental health monitoring applications. movement-could provide insights into the model’s adaptability
and inform adjustments needed for robust performance outside
controlled settings. Finally, future directions could also include
the integration of multi-modal data, such as physiological or
6 Conclusion and discussion behavioral metrics, to enhance the model’s diagnostic capability.
By fusing EEG data with other biological signals, the model could
This study aimed to address the challenge of classifying mental achieve a more holistic understanding of mental health conditions,
health states based on EEG signals, particularly in handling thus broadening its applicability and improving its robustness in
complex spatiotemporal dependencies and diverse neural activity diverse settings.
patterns. We proposed the EEGMind-Transformer model, which
integrates a Dynamic Temporal Graph Attention Mechanism
(DT-GAM), a Hierarchical Graph Representation and Analysis Data availability statement
module (HGRA), and a Spatial-Temporal Fusion Module (STFM)
to effectively capture the spatiotemporal features within EEG The original contributions presented in the study are included
data. In the experiments, we used several representative datasets, in the article/supplementary material, further inquiries can be
including EEGEyeNet, PhyAAt, eSports Sensors, and DEAP, and directed to the corresponding author.
compared the model’s performance against six state-of-the-art
(SOTA) methods. The results demonstrated that the EEGMind-
Transformer significantly outperformed the other methods across Author contributions
key metrics.
The potential applications of this model span several fields. ZL: Writing – original draft, Writing – review & editing. JZ:
In mental health monitoring, the EEGMind-Transformer can Writing – original draft, Writing – review & editing.
be deployed in wearable EEG devices for continuous stress
monitoring, early detection of depression, and tracking mental
health trends over time. Such applications are particularly relevant Funding
for telehealth platforms, where clinicians can remotely monitor
patients and receive actionable insights based on EEG-based The author(s) declare financial support was received for the
biomarkers. Additionally, the model’s ability to detect cognitive research, authorship, and/or publication of this article. This work
load makes it suitable for adaptive learning systems in educational was supported by 2021 Provincial Quality Engineering Teaching
contexts, where real-time analysis of cognitive states can guide Research Project for Higher Education Institutions (Key) “Research
personalized content delivery to optimize learning outcomes. In on the Implementation Path of Course Ideological and Political
References
Ai, S., Ding, H., Ping, Y., Zuo, X., and Zhang, X. (2023). Exploration of digital 6, 77–82. Available at: https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/iet-
transformation of government governance under the information environment. IEEE spr.2017.0140
Access 11, 78984–78993. doi: 10.1109/ACCESS.2023.3297887
Kwon, M., Lee, J., Kim, D.-J., and Kim, S. H. (2019). A deep learning framework for
Akter, M., Islam, N., Ahad, A., Chowdhury, M. A., Apurba, F. F., Khan, R., classification of EEG signals with motor imagery in brain–computer interface. Expert
et al. (2024). An embedded system for real-time atrial fibrillation diagnosis using a Syst. Appl. 137, 12–25.
multimodal approach to ECG data. Eng 5, 2728–2751. doi: 10.3390/eng5040143
Lakshminarayanan, K., Shah, R., Daulat, S. R., Moodley, V., Yao, Y., Madathil,
Alhussein, M., Muhammad, G., Alshehri, F. M., et al. (2019). A support vector D., et al. (2023). The effect of combining action observation in virtual reality
machine-based method for monitoring mental fatigue using EEG signals. Comput. Biol. with kinesthetic motor imagery on cortical activity. Front. Neurosci. 17:1201865.
Med. 107, 20–29. Available at: https://www.mdpi.com/2076-3417/9/20/4402 doi: 10.3389/fnins.2023.1201865
Cassani, R., Estarellas, M., San-Martin, R., Fraga, F. J., and Falk, T. H. (2018). EEG as LaRocco, J., Tahmina, Q., Lecian, S., Moore, J., Helbig, C., Gupta, S. (2023).
a measure of mental health states: a review of its application in clinical and non-clinical Evaluation of an english language phoneme-based imagined speech brain computer
contexts. IEEE J. Biomed. Health Inform. 22, 1559–1572. doi: 10.1155/2018/5174815 interface with low-cost electroencephalography. Front. Neuroinform. 17:1306277.
doi: 10.3389/fninf.2023.1306277
Craig, S., and Tran, S. (2020). The challenge of EEG-based mental state classification
and its impact on real-time applications. Comput. Intell. Neurosci. 2020:9856598. Lawhern, V. J., Solon, A. J., Waytowich, N. R., Gordon, S. M., Hung, C. P., Lance,
Available at: https://www.frontiersin.org/journals/neurorobotics/articles/10.3389/ B. J., et al. (2018). EEGnet: a compact convolutional neural network for EEG-based
fnbot.2020.00025/full brain-computer interfaces. J. Neural Eng. 15:056013. doi: 10.1088/1741-2552/aace8c
Craik, A., He, Y., and Contreras-Vidal, J. L. (2019). Advances in machine learning Li, Y., Wu, G., Gao, H., and Ma, Y. (2020). Deep learning in bioinformatics:
for EEG signal analysis in brain–computer interfaces. Neural Netw. 110, 116–129. Introduction, application, and perspective in the big data era. Methods 166, 1–3.
Available at: https://www.researchgate.net/profile/Xiang-Zhang-104/publication/ doi: 10.1016/j.ymeth.2019.04.008
333802253_Short_Version_A_Survey_on_Deep_Learning_based_Brain-Computer_
Liaw, A., and Wiener, M. (2002). Classification and regression by randomforest. R
Interface_Recent_Advances_and_New_Frontiers/links/5d04f78c92851c90043da583/
News 2, 18–22.
Short-Version-A-Survey-on-Deep-Learning-based-Brain-Computer-Interface-
Recent-Advances-and-New-Frontiers.pdf?__cf_chl_tk=aJNyMZy8P.xqJ1Ys_ Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., and Arnaldi, B. (2018). A review of
aRrC8gaSeczSCNZGvd2kh8Slh8-1734328339-1.0.1.1-YySLE7Zzh_J63s5GVf1X7ABg. classification algorithms for EEG-based brain–computer interfaces: a 10-year update.
748uDNhPe3IdAkxaCY J. Neural Eng. 15:031005. doi: 10.1088/1741-2552/aab2f2
Delorme, A., and Makeig, S. (2004). EEGlab: an open source toolbox for analysis Michelmann, S., Grace, E., and Smith, L. (2020). Data-driven classification of
of single-trial EEG dynamics including independent component analysis. J. Neurosci. psychophysiological signals for mental state monitoring. J. Neural Eng. 17:036014.
Methods 134, 9–21. doi: 10.1016/j.jneumeth.2003.10.009 Available at: https://ieeexplore.ieee.org/abstract/document/10521576
Gao, X., Lin, Y., and Liu, F. (2021). Monitoring of mental health conditions using Parisot, S., Ktena, S. I., Ferrante, E., Lee, M., Glocker, B., Rueckert, D., et al. (2018).
EEG: a systematic review. Front. Neurosci. 15:684765. Available at: https://www.mdpi. “Graph convolutional networks for brain network analysis,” International Conference
com/1424-8220/21/10/3461 on Medical Image Computing and Computer-Assisted Intervention, 79–87. Available at:
https://www.sciencedirect.com/science/article/pii/S0010482520304273
Goswami, R., Srivastava, G., and Mishra, S. (2022). EEG-based assessment of mental
health: a review of computational approaches and clinical implications. Comput. Math. Plis, S. M., Hjelm, R. D., Salazar, A., Turner, J., Schirner, M., et al. (2018). Multimodal
Methods Med. 2022:8103154. Available at: https://www.mdpi.com/1424-8220/21/15/ approaches for EEG-fmri analysis: benefits and challenges. Front. Neurosci. 12:156.
5043 Available at: https://www.frontiersin.org/articles/10.3389/fnhum.2018.00029/full
Guo, Y., Cao, X., Wang, X., and Yang, J. (2019). A novel approach for time series Roy, Y., Banville, H., Albuquerque, I., Gramfort, A., Falk, T. H., Faubert, J., et al.
classification based on svm. Pattern Recognit. Lett. 119, 209–216. Available at: https:// (2019). Deep learning for EEG-based classification: a review. J. Neural Eng. 16:051001.
ieeexplore.ieee.org/abstract/document/8635553 doi: 10.1088/1741-2552/ab260c
He, F., Bai, K., Zong, Y., Zhou, Y., Jing, Y., Wu, G., et al. (2023). Makeup transfer: a Schirrmeister, R. T., Springenberg, J. T., Fiederer, L. D. J., Glasstetter, M.,
review. IET Comput. Vis. 17, 513–526. doi: 10.1049/cvi2.12142 Eggensperger, K., Tangermann, M., et al. (2017). Deep learning with convolutional
neural networks for EEG decoding and visualization. Hum. Brain Mapp. 38, 5391–5420.
Hong, Q., Dong, H., Deng, W., and Ping, Y. (2024). Education robot object
doi: 10.1002/hbm.23730
detection with a brain-inspired approach integrating faster R-CNN, yolov3, and semi-
supervised learning. Front. Neurorobot. 17:1338104. doi: 10.3389/fnbot.2023.1338104 Sihag, K., Arora, A., Sharma, S., and Ahuja, K. (2022). Temporal dynamics in
graph-based methods for EEG analysis: a review. Front. Comput. Neurosci. 16:867382.
Karim, F., Majumdar, S., Darabi, H., and Chen, S. (2018). “LSTM fully convolutional
Available at: https://arxiv.org/abs/2408.06027
networks for time series classification,” in 2018 IEEE international conference on big
data (Big Data) (IEEE), 2475–2482. Available at: https://ieeexplore.ieee.org/abstract/ Simar, C., Colot, M., Cebolla, A.-M., Petieau, M., Cheron, G., Bontempi, G.,
document/8141873 et al. (2024). Machine learning for hand pose classification from phasic and tonic
EMG signals during bimanual activities in virtual reality. Front. Neurosci. 18:1329411.
Kosaraju, V., Sadeghian, A., Martín-Martín, R., Reid, I., Rezatofighi, H.,
doi: 10.3389/fnins.2024.1329411
and Savarese, S. (2019). Social-bigat: multimodal trajectory forecasting using
bicycle-gan and graph attention networks. Adv. Neural Inf. Process. Syst. Song, T., Zheng, W., Tang, Z., and Lin, Z. (2021). Graph-based deep learning for
32. Available at: https://proceedings.neurips.cc/paper_files/paper/2019/hash/ EEG signal classification. IEEE Trans. Neural Syst. Rehabil. Eng. 29, 188–197. Available
d09bf41544a3365a46c9077ebb5e35c3-Abstract.html at: https://www.mdpi.com/1424-8220/21/14/4758
Krigolson, O. E., Williams, C. C., and Norton, A. (2017). Using neurophysiological Stahl, B. C., Timmermans, J., and Mittelstadt, B. (2019). Ethical considerations for
data to inform mental health research: EEG and ERP methods. Front. Psychol. 8:2017. brain-computer interfaces in mental health monitoring. J. Med. Ethics 45, 597–604.
Available at: https://www.sciencedirect.com/science/article/pii/S0167876016301180 doi: 10.1136/medethics-2018-105313
Kumar, R., and Mittal, N. (2018). EEG feature extraction and classification using Sturm, I., Haarmann, H., and Bannach, D. (2021). On the interpretability of deep
power spectral density and support vector machine. Int. J. Electron. Commun. Eng. learning models for EEG-based brain-computer interfaces: a review. IEEE Trans.
Neural Syst. Rehabil. Eng. 29, 282–297. Available at: https://link.springer.com/article/ Wu, F., Souza, A., Zhang, T., Fifty, C., Yu, T., Weinberger, K., et al.
10.1007/s00521-020-05624-w (2019). “Simplifying graph convolutional networks,” in International conference on
machine learning, 6861–6871. PMLR. Available at: https://proceedings.mlr.press/v97/
Tsiouris, K. M., Pezoulas, V. C., Zervakis, M., Karatzanis, I., Tzallas, A. T., et al.
wu19e
(2018). Deep learning approaches for improving classification performance of EEG-
based cognitive load assessment models. IEEE Trans. Neural Netw. Learn. Syst. 29, Wu, S., Wang, J., Ping, Y., and Zhang, X. (2022). “Research on individual
2809–2818. Available at: https://link.springer.com/chapter/10.1007/978-3-030-04021- recognition and matching of whale and dolphin based on efficientnet model,” in 2022
5_6 3rd International Conference on Big Data, Artificial Intelligence and Internet of Things
Engineering (ICBAIE) (Xi’an: IEEE), 635–638. doi: 10.1109/ICBAIE56435.2022.998
Varatharajan, N., Mahajan, A., and Goel, H. (2022). Real-world applications of EEG-
5881
based mental health monitoring systems. IEEE Access 10, 42615–42627. Available at:
https://ieeexplore.ieee.org/abstract/document/6824740 Zhao, Q., Zhao, L., and Cai, F. (2019). LSTM recurrent neural networks
Wan, Z., Li, M., Liu, S., Huang, J., Tan, H., Duan, W., et al. (2023). EEGformer: for multiple pattern recognition in EEG signals. IEEE Trans. Neural Syst.
a transformer-based brain activity classification method using EEG signal. Front. Rehabil. Eng. 27, 1786–1795. Available at: https://pdfs.semanticscholar.org/6a4d/
Neurosci. 17:1148855. doi: 10.3389/fnins.2023.1148855 4bf50c412635335af1b8444d427d9c3e8c86.pdf