0% found this document useful (0 votes)
15 views11 pages

Integrating Large Language Model EEG and Eye-Tracking For Word-Level Neural State Classification in Reading Comprehension

This study explores the integration of large language models (LLMs), EEG, and eye-tracking to classify neural states at a word level during reading comprehension tasks. It demonstrates that words with high relevance to a keyword receive significantly more eye fixations and achieves over 60% classification accuracy in distinguishing between high-relevance and low-relevance words. The findings have implications for cognitive science and potential applications in reading assistance technologies and rehabilitation for individuals with reading difficulties.

Uploaded by

feiyangrandy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views11 pages

Integrating Large Language Model EEG and Eye-Tracking For Word-Level Neural State Classification in Reading Comprehension

This study explores the integration of large language models (LLMs), EEG, and eye-tracking to classify neural states at a word level during reading comprehension tasks. It demonstrates that words with high relevance to a keyword receive significantly more eye fixations and achieves over 60% classification accuracy in distinguishing between high-relevance and low-relevance words. The findings have implications for cognitive science and potential applications in reading assistance technologies and rehabilitation for individuals with reading difficulties.

Uploaded by

feiyangrandy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL.

32, 2024 3465

Integrating Large Language Model, EEG, and


Eye-Tracking for Word-Level Neural
State Classification in Reading
Comprehension
Yuhong Zhang , Graduate Student Member, IEEE, Qin Li, Sujal Nahata, Tasnia Jamal ,
Shih-Kuen Cheng , Gert Cauwenberghs , Fellow, IEEE, and Tzyy-Ping Jung , Fellow, IEEE

Abstract— With the recent proliferation of large language no fixations. This study represents the first attempt to
models (LLMs), such as Generative Pre-trained Transform- classify brain states at a word level using LLM-generated
ers (GPT), there has been a significant shift in exploring labels. It provides valuable insights into human cogni-
human and machine comprehension of semantic language tive abilities and Artificial General Intelligence (AGI), and
meaning. This shift calls for interdisciplinary research offers guidance for developing potential reading-assisted
that bridges cognitive science and natural language pro- technologies.
cessing (NLP). This pilot study aims to provide insights
into individuals’ neural states during a semantic inference Index Terms— Large language model, brain–computer
reading-comprehension task. We propose jointly analyz- interface, human–computer interface, EEG, eye-fixation,
ing LLMs, eye-gaze, and electroencephalographic (EEG) cognitive computing, pattern recognition, reading compre-
data to study how the brain processes words with varying hension, computational linguistics.
degrees of relevance to a keyword during reading. We also
use feature engineering to improve the fixation-related EEG I. I NTRODUCTION
data classification while participants read words with high ECENT advancements in LLMs and generative AI have
versus low relevance to the keyword. The best valida-
tion accuracy in this word-level classification is over 60%
R significantly impacted various aspects of human society
and industry. Notable examples include GPT, Llama models
across 12 subjects. Words highly relevant to the infer-
ence keyword received significantly more eye fixations per developed by OpenAI and Meta, among others [1], [2], [3],
word: 1.0584 compared to 0.6576, including words with [4]. As artificial agents improve their proficiency, it becomes
increasingly crucial to deepen our understanding of the inter-
Manuscript received 17 February 2024; revised 1 May section between Machine Learning (ML), decision-making
2024 and 27 June 2024; accepted 23 July 2024. Date of publication processes, and human cognitive functions [5]. For instance,
14 August 2024; date of current version 20 September 2024.
(Corresponding author: Tzyy-Ping Jung.) both humans and machines employ strategies for semantic
Yuhong Zhang is with the Shu Chien-Gene Lay Department of Bio- inference. Humans extract crucial information from texts via
engineering, University of California San Diego, La Jolla, CA 92093 USA specific gaze patterns during reading [6], [7], [8], whereas
(e-mail: [email protected]).
Qin Li is with the Department of Bioengineering, University of language models predict subsequent words using contextual
California at Los Angeles, Los Angeles, CA 90095 USA (e-mail: cues [9]. Therefore, this pilot study raises the question: Can
[email protected]). we differentiate individuals’ mental states when their gaze
Sujal Nahata is with the Department of Computer Science and Engi-
neering, University of California San Diego, La Jolla, CA 92093 USA fixates on words of varying significance within a sentence,
(e-mail: [email protected]). particularly at a word level, during tasks involving semantic
Tasnia Jamal is with the Department of Electrical and Computer inference and reading comprehension?
Engineering, University of California San Diego, La Jolla, CA 92093 USA
(e-mail: [email protected]). The success of the prediction tasks could have signifi-
Shih-Kuen Cheng is with the Institute of Cognitive Neuro- cant implications for current AI applications in both science
science, National Central University, Taoyuan 32001, Taiwan (e-mail: and rehabilitation technology. This includes Human-in-the-
[email protected]).
Gert Cauwenberghs is with the Shu Chien-Gene Lay Department loop machine learning (ML) [10], brain-computer interfaces
of Bioengineering and the Institute for Neural Computation, Uni- (BCI) for text communications [11], personalized learn-
versity of California San Diego, La Jolla, CA 92093 USA (e-mail: ing and accessibility tools in real-time [12], and cognitive
[email protected]).
Tzyy-Ping Jung is with the Institute for Neural Computation, Uni- training programs [13], which could be tailored to healthy
versity of California San Diego, La Jolla, CA 92093 USA (e-mail: individuals or patients. For example, stroke survivors may
[email protected]). experience “acquired dyslexia” or “alexia,” with or without
This article has supplementary downloadable material available at
https://doi.org/10.1109/TNSRE.2024.3435460, provided by the authors. other language challenges. Treatment strategies could involve
Digital Object Identifier 10.1109/TNSRE.2024.3435460 compensatory techniques and BCI technology to assist with

© 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
For more information, see https://creativecommons.org/licenses/by/4.0/
3466 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 32, 2024

reading, thus connecting our findings to practical rehabilitation NLP methods to build a comprehensive model for extracting
scenarios. keywords from sentences, employing deep neural networks for
Previous studies demonstrate biomarkers that affirm patterns binary classification. However, the inflexibility of the embed-
in subjects during reading comprehension tasks. For example, ded NLP model and the extreme data imbalance between
several neurobiological markers linked to reading compre- the two classes resulted in significant over-fitting during the
hension, including P300 and N400, were first identified in training of the classification model. As an improvement, this
the 1980s [14]. As the groundbreaking research in reading study uses advanced LLMs, such as GPT-4, to generate robust
comprehension, the study revealed that there are distinct pat- ground truths for HRWs and LRWs to the inference keyword
terns in N400 for “semantic moderate” and “semantic strong” target. These ground truths are the foundation for extracting
words [15]. EEG time series data at the word level for 12 subjects.
Furthermore, classical theories within the cognitive sci- Given the exploratory nature of this research as a pilot
ence community aim to elucidate and delineate the processes study and the overall classification results exceeding 60%,
through which humans comprehend text and make infer- it shows that the joint utilization of EEG and eye-tracking
ences. Kintsch [16] introduced the Construction-Integration data is a viable biomarker for classifying whether subjects
(CI) model, which posits text comprehension as a two-stage detect words of significant meaning in inference tasks. This
process: initially constructing a textbase (comprehending the study represents the first attempt to use LLMs for labeling
text at the surface and propositional level) and subsequently word relevance, which is then integrated with EEG signal
integrating it with prior knowledge to form a situation model analysis to explain potential patterns in human comprehen-
(a mental representation of the text’s content). Evans [17] sion and inference-making, specifically concerning words with
suggests that cognition comprises two types of processes - substantial meaning.
automatic (Type 1) and deliberative (Type 2). The automatic The remainder of this study is organized as follows:
process operates swiftly and relies on heuristics, whereas the Section II presents the dataset used in our study, including
deliberative process is slower, conscious, and grounded in subject information, experiment paradigms, and the data col-
logical reasoning. Similar orthodox theories of text compre- lection process and equipment. Section III explains our data
hension include Mental Models [18] among others. While processing pipeline methods involving the EEG feature extrac-
these theories in cognitive science offer valuable insights into tion pipeline and classification algorithms. Section IV exhibits
text comprehension and inference, they often oversimplify our LLM comparison, eye-fixation statistics, fixation-related
cognitive processes and do not fully account for individual potential, classification results for 12 subjects across eight-
differences and context variability [19]. keyword relations, and the corresponding analysis. Lastly,
With the advancement of ML algorithms, BCI technolo- in Section V, we juxtapose our findings with existing literature,
gies [20], and NLP techniques [21], conducting studies on deliberate on the challenges of our study, and propose potential
reading comprehension in natural settings has become increas- avenues for future research.
ingly feasible. Various signal modalities are employed in
cognitive studies to investigate subjects’ mental states, includ- II. DATASET
ing Electroencephalography (EEG) [22], Functional Magnetic The ZuCo dataset consists of high-density EEG and eye-
Resonance Imaging (fMRI) [23], Magnetoencephalography tracking data from 12 native English speakers, aged between
(MEG) [24], Positron Emission Tomography (PET) [25], 22 and 54 years. It captures 21,629 words, 1,107 sentences,
and Eye-tracking methods [26]. For our study, because of and 154,173 fixations collected over 4-6 hours of natural
its high temporal and spatial resolution and non-invasive text reading. Participants completed the reading tasks in two
properties, we specifically employ high-density EEG. Partic- sessions, each lasting 2-3 hours and held at the same time of
ularly, Hollenstein [27] have recorded simultaneous EEG and day. The sequence started with Task 2, where participants read
Eye-tracking data while subjects engage in sentence reading Wikipedia sentences about relationships, followed by the first
tasks, suggesting integrating these technologies with NLP tools half of Task 1, a sentiment analysis task. The second session
holds significant potential. This integration enables us to delve began with Task 3, which involved reading specific relational
deeply into the natural reading process, potentially paving the content on Wikipedia, and concluded with the second half of
way for developing real-time reading monitors and converting Task 1.
everyday reading materials into computationally analyzable Data collection took place in a controlled environment. EEG
formats [28], [29]. data were recorded using a 128-channel Geodesic Hydrocel
This study uses the Zurich Cognitive Language Processing system from Eugene, Oregon, with a sampling rate of 500 Hz
Corpus version 1.0 (ZuCo) dataset [27] to explore poten- and a bandpass filter set from 0.1 to 100 Hz, although only
tial patterns distinguishing two specific mental states—those 105 channels were used. Impedance was maintained below
triggered when subjects fixate on semantically salient words 40 kOhm. Originally recorded with a reference at Cz, the EEG
(High-Relevance Words or HRW) and less significant words data were later re-referenced to the average of the mastoid
(Low-Relevance Words or LRW) during ZuCo 1.0’s Task 3, channels for our study. Eye movements and pupil sizes were
which is centered on semantic inference. The main contri- captured using an EyeLink 1000 Plus eye tracker, which also
bution of this study lies in the unique integration of NLP operated at a sampling rate of 500 Hz.
methods, EEG, and eye-tracking biomarker analysis across We focused on Task 3 of the ZuCo dataset, which involves
multiple information modalities. Prior work by [21] used seven reading sentences from the Wikipedia corpus that include
ZHANG et al.: INTEGRATING LLM, EEG, AND EYE-TRACKING FOR WORD-LEVEL NEURAL STATE CLASSIFICATION 3467

Fig. 1. Overview of Task 3 Experimental Design and Language Model-Driven Classification. (a) Setup for Task 3: Subjects read sentences
with relational keywords on-screen, while their eye movements and EEG responses were tracked. They determined if the highlighted relation
appeared in each sentence. (b) LLM Output: Displays a sentence exhibiting the “AWARD” relation with words categorized by high- and low-relevance
in red and blue font colors. (c) Classification Pipeline: Sentences are analyzed by language models to sort words by relevance. Eye-tracking data
aligns with EEG for feature extraction, culminating in a binary classification of word relevance.

specific semantic relations such as job titles, educational to word fixations. Aligning fixations with word boundaries
affiliations, political affiliations, nationalities, and awards. and line allocations, they extracted and segmented EEG data
Participants were required to identify whether each sentence around each fixation time.
contained a predetermined relation, answering control ques- Our study analyzed eye-fixation and EEG data features,
tions to confirm their responses. This task achieved the highest specifically on both HRW and LRW. These features are gaze
mean accuracy score of 93.16% among participants. For our duration (GD), total reading time (TRT), first fixation duration
analysis, we selected eight of the nine-word relations in (FFD), single fixation duration (SFD), and go-past time (GPT).
Task 3, excluding the ‘VISITED’ relation due to its ambiguous For eye-fixation features, we used the data directly from
interpretability. Of the original 407 sentences, 356 were used. ZuCo; for EEG data, we extracted our features based on its
Specific participants missed certain relations; for example, preprocessed data. For additional details on the data collection
ZGW missed ‘JOB,’ ZKB missed ‘WIFE,’ and ZPH missed methodology and protocols, readers are referred to the original
‘POLITICAL AFFILIATION’ and ‘WIFE.’ Task 3 sentences ZuCo study [27].
were presented one at a time on a screen, with participants
briefed beforehand on the specific relations to focus on. Prac- III. M ETHOD
tice rounds were conducted before the actual data recording A. LLM and Word Extraction
to ensure understanding of the task requirements. Fig. 1 (a) OpenAI’s GPT-3.5-turbo (hereafter referred to interchange-
illustrates Task 3. ably as GPT-3.5) and GPT-4, along with Meta’s LLaMa
Eye-tracking data were processed to identify saccades, (boasting 65 billion parameters), are at the forefront of NLP
fixations, and blinks. Fixations, defined as periods of stable technology. GPT-3.5 and GPT-4 are equipped with approxi-
gaze, were precisely adjusted using a Gaussian mixture model mately 175 billion and 1.8 trillion parameters, respectively,
on the y-axis to ensure accurate alignment with text lines. and excel in text generation tasks [4]. Additionally, Phind
This meticulous adjustment facilitated the precise mapping has emerged as a popular and freely accessible tool for AI
of eye fixations to corresponding lines of text. EEG data dialogue generation and question-answering. These models
were preprocessed using Automagic software, which included and tools collectively epitomize the current state-of-the-art in
importing data into MATLAB, extracting triggers, and identi- language understanding and generation. We employ all four
fying bad electrodes. The data were high-pass filtered at 0.5 Hz models on the Task 3 corpus for initial semantic analysis and
and notch filtered between 49×51 Hz to minimize frequency sanity checks. However, in the main analysis of this study
interference. They then regressed EOG channels from the scalp focusing on EEG and eye-fixation data, only GPT-3.5 and
EEG to eliminate eye artifacts and then performed artifact GPT-4 are utilized, considering a balance between precision
rejection using the Multiple Artifact Rejection Algorithm and data point preservation.
(MARA). Next, they synchronized the EEG signals with the We input the following Prompt to all LLMs to extract HRWs
eye-tracking data to segment the EEG data corresponding and LRWs.:
3468 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 32, 2024

Algorithm 1 Grouping Words and Extracting EEG Epochs Using LLMs


Require: SentenceTable, WordEEGSegment
Ensure: WordGroups, Mistakes, EEGGroups
1: Initialize: Mistakes, TempWords, WordGroups, EEGGroups
2: Models ← [‘GPT-3.5 Turbo’, ‘GPT-4’, ‘LLaMA’, ‘Phind’]
3: Relations ← [‘AWARD’, ‘EDUCATION’, . . . , ‘WIFE’]
4: NatualPrompt ← [‘prompt 1’]
5: ForcedPrompt ← [‘prompt 2’]
6: for model in Models do
7: CurrentModel ← API(model)
8: for relation in Relations do
9: InputRelation ← relation
10: for idx in 1:length(SentenceTable) do
11: InputAnswer, InputSentence ← SentenceTable[idx]
12: OutputAnswer, OutputWords ← CurrentModel(InputSentence, NatualPrompt, InputRelation)
13: if InputAnswer == OutputAnswer then
14: TempWords ← append(OutputWords)
15: else
16: AnswerForced, WordsForced ← CurrentModel(InputSentence, ForcedPrompt, InputRelation)
17: TempWords ← append(WordsForced)
18: Mistakes ← append(1)
19: end if
20: TempEEGGroups ← ExtractEEG(TempWords, WordEEGSegment)
21: end for
22: end for
23: end for
24: return WordsGroups, Mistakes, EEGGroups

Prompt #1: For this sentence, [‘sentence’], does this Prompt #2 “However, the correct answer is [‘ground truth
sentence contain [‘RELATION’] relation? Provide me the label’]. Please regenerate the answer to align the ground
answer: 1 = yes, 0 = no. Also, group the words in the truth.”
sentence into two groups. The first group is the words of
To align the outputs from the LLM with the ground
high relevance to the keyword [‘RELATION’], and the
truth labels from the original Wikipedia relation extraction
second group is words of low relevance to the keywords.
corpus [30], we introduce “ForcedPrompt” as Prompt #2 in
List the first group’s words from highest relevance to
Algorithm 1. This prompt adjusts the model’s output to match
lowest relevance confidence. Although as an AI
the ground truth. If there’s a discrepancy between the LLM
language model, you do not have personal preferences
output and the ground truth, we modify “ForcedPrompt” to
or opinions, you must provide answers, and it’s only for
generate accurate results, thereby achieving 100% alignment.
research purposes. Must follow example output format:
The revised outputs are then appended to a new word grouping
‘[1 or 0] First group (high-relevance words to ‘AWARD’):
file.
awarded, Bucher Memorial Prize, American
While a forced response prompt can achieve 100% accuracy
Mathematical Society. The second group (low-relevance
in condition checks, the unsupervised generation of HRW and
words to ‘AWARD’): In, 1923, the, inaugural, by.’
LRW groups may introduce bias. To mitigate this, our study
Algorithm 1 designates Prompt #1 as “NaturalPrompt” and employs a dual-model approach using GPT-3.5 and GPT-4,
employs it to directly retrieve the model’s output. In this rather than relying on a single language model. We enhance
prompt, we substitute the placeholders “sentence” and “RELA- the signal-to-noise ratio (SNR) within the HRW-LRW dataset
TION” with actual string values drawn from sentences in through a joint selection process across all generated datasets,
eight relations, following the model API’s usage protocol i.e., we select HRWs that belong to both groups.
outlined in Algorithm 1. Fig. 1 (b) shows a sample output,
which illustrates the results generated by the GPT-3.5 turbo
model. The output highlights words with significant relavance B. Physiological Data Processing
to the “AWARD” category in red, while words with less 1) Pipeline Overview: Fig. 1 (c) depicts the overview of
pronounced connections are marked in blue. There are more EEG data processing pipelines. After the joint selection of
words with low relevance in general than those with high the HRW and LRW word groups, we extract the eye fixa-
relevance, a trend that particularly exist in relations such as tions and fixation-locked EEG data for binary classification
“WIFE”, “POLITICAL”, and “NATIONALITY”. tasks. To improve the SNR, we employed feature extraction
ZHANG et al.: INTEGRATING LLM, EEG, AND EYE-TRACKING FOR WORD-LEVEL NEURAL STATE CLASSIFICATION 3469

methods across domains of spectrum analysis, information matrix is asymmetric because mutual information and CondEn
theory, connectivity network, and their combined features. measure different aspects of the relationship between X and Y .
An embedded classifier architecture was utilized, incorporating Flattening this matrix results in over 10,000 feature variables.
established classifiers such as Support Vector Machine (SVM) To manage this high dimensionality, we focus on one half
and Discriminant Analysis. For Fixation-Related Potential of the matrix and apply PCA to reduce the feature space to
(FRP) analysis, EEG signal extraction was restricted to a 30 principal components.
predefined time window for each word, ranging from 100ms c) Connectivity network: The human brain is an expansive
pre-fixation to 400ms post-fixation. and intricate network of electrical activity [33]. Understanding
2) FRP Analysis: In contrast to one-dimensional ERP the intricate connections within the brain and quantifying its
averages, which can obscure dynamic information and connectivity has garnered increasing interest [34], [35], [36].
inter-trial variability [31], we employed ERPimage for a This study employed the Phase Locking Value (PLV) to con-
two-dimensional representation that allows for trial-by-trial struct a weighted undirected brain connectivity network [37].
analysis. Utilizing the ERPimage.m function in the eeglab Each channel is represented as a node in the graph, and we
toolbox (MATLAB 2023b, EEGlab 2020), we generated FRPs depict the correlation strength between channels as the edges
for both HRWs and LRWs across 12 subjects. A smoothing connecting them.
parameter of 10 was applied to enhance the clarity of the After constructing the weighted brain network, a range
FRPimage, which span a temporal window from 100ms pre- of graph theory measurements can be used as features for
fixation to 400ms post-fixation, resulting in a comprehensive analyzing EEG signals. These measurements capture various
ERP signal duration of 500ms. aspects of the network’s structure and organization, including
3) EEG Feature Extraction: degree, similarity, assortativity, and core structures [38], [39].
a) Band power: We calculated the power in five EEG We use the clustering coefficient to reduce the dimension to
frequency bands: delta (0.5-4 Hz), theta (4-8 Hz), alpha 30 variables.
(8-13 Hz), beta (13-30 Hz), and gamma (30-64 Hz). 2e(N (v))
We employed MATLAB’s “bandpower” function from the C(v) = (4)
|N (v)|(|N (v)| − 1)
Signal Processing Toolbox. The band power (BP) Pa,b is
computed as follows: In this equation, 2e(N (v)) counts the total number of edges
Z b Z b in the neighborhood of v, and |N (v)|(|N (v)| − 1) is the
Pa,b = P(ω)dω = |F(ω)|2 dω (1) total number of possible edges in the neighborhood of v.
a a The coefficient 2 in the numerator accounts for each edge
where Pa,b represents the power in the frequency band [a, b], connecting two vertices and is counted twice. The clustering
P(ω) denotes the power spectral density, |F(ω)|2 is the coefficient provides insights into the tendency of nodes in a
squared magnitude of the Fourier transforms, with a and b graph to form clusters or communities, with higher values
being the lower and upper bounds of the frequency band, indicating a greater density of interconnected nodes [39].
respectively. The EEG data comprised 105 channels, resulting d) Combine all three features: Inspired by [40], combining
in 525 feature variables per trial. To address the challenge features from different domains might improve the quality of
posed by this extensive variable set, many of which exhibited features and classification performance. We concatenate the
redundancy, we used Principal Component Analysis (PCA) to three features we introduced above, resulting in 90 variables.
reduce the dimensionality of the data to 30 variables. 4) ML Classifiers and Feature Selection: Initially, the
b) Conditional entropy: This study used conditional features—BP, CondEn, and PLV-connectivity network—have
entropy (CondEn) to extract features of the EEG trail. It serves high dimensions with original dimensions of 525 (105 × 5),
(11025−105)

as a metric quantifying the level of mutual information 5565, and 5565 2 + 105 , respectively. We reduced
between the two random variables [32]. The mutual infor- the input variables for subsequent classifier training to 30 for
mation between two discrete random variables is defined as each feature by applying PCA and the clustering coefficient
follows: for feature selection. Generally, Discriminant Analysis and
 
XX p(x, y) SVMs are frequently used as non-neural network classifiers
I (X ; Y ) = p(x, y) log (2) in BCI [41]. We incorporated features extracted from EEG
p(x) p(y)
y∈Y x∈X
signals to train 11 classifiers simultaneously: LDA, QDA,
where p(x) is the approximate density function. By employing Logistic Regression, Gaussian Naive Bayes, Kernel Naive
this approach, the mutual information I (X ; Y ) is computed, Bayes, Linear SVM, Quadratic SVM, Cubic SVM, Fine Gaus-
establishing its connection with the CondEn I (X ; Y ). sian SVM, Medium Gaussian SVM, and Coarse Gaussian
X X SVM. The highest classification accuracy is selected as the
H (X |Y ) = − p(y) p(x|y) log2 p(x|y) (3) final result. To ensure the validity of our outcomes, particularly
y∈Y x∈X
for smaller sample groups, we report 5-fold cross-validation
where H (X |Y ) is the CondEn of X given Y , p(y) is the accuracy.
probability of occurrence of a value y from Y , p(x|y) is the Given the significant class imbalance—LRW EEG data
conditional probability of x given y, the sums are performed points outnumbering HRW by over 3:1—we applied
over all possible values of x in X and y in Y . For 105 non-repetitive random downsampling to the LRW class. This
EEG channels, we generate a 105-by-105 CondEn matrix. This ensures equal representation of HRW and LRW data points in
3470 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 32, 2024

TABLE I
H UMAN AND LLM ACCURACY FOR TASK 1 AND TASK 3

Fig. 2. Average Fixation Counts on the HRWs and LRWs. The left figure displays the average fixation count across 12 subjects, including
words without receiving any fixations. “No-fixation” words appear in both HRW and LRW groups. The average fixation count for HRWs appears
much greater in this plot. In contrast, the right figure presents the same comparison but excludes words with no fixations, providing a more robust
assessment of the average fixation differences between HRW and LRW. As expected, when we omit instances of no-fixation words, the average
fixation count for LRWs increases significantly. However, it’s noteworthy that even with this adjustment, the average fixation count for HRWs remains
higher than that of LRWs across all subjects. This observation supports the hypothesis that subjects focus more on words closely aligned with the
keyword. The whiskers in the figures represent the standard deviation across the eight keyword relations.

the training set. Consequently, the chance label of validation To mitigate this variability and optimize resource utilization,
accuracy is 50%. we executed each model five times and calculated the mean
While deep learning approaches have shown promise in of their responses as the final output. From Table I, GPT-4
EEG classification [42], these model’s explainability remains has the highest mean and lowest standard deviation among
a subject of ongoing discussion [43], We refrained from using 12 subjects and all LLMs. Task 1 focused on sentiment
deep neural network techniques in this study. inference, 12 subjects generally have lower accuracy than Task
3. We didn’t include Task 2 because it shares the same corpus
IV. R ESULTS with Task 3. While GPT-3.5 attained a lower score of 95.59%,
This section presents the results of our study. First, we dis- it still outperformed all subjects.
cuss the results pertaining to the LLM comparisons, offering GPT-3.5 and GPT-4 categorize words into HRW and LRW
statistical insights into the differences between GPT-3.5 and sets for all sentences in Task 3. Specifically, GPT-3.5 generates
GPT-4 in generating labels for classification. Subsequently, the first group of HRW and LRW, while GPT-4 produces
we showcase eye fixation statistics for HRWs and LRWs. the second. By joint selection, we identify common elements
Next, we highlight the FRP analysis of the Fixation-locked between these first and second HRW groups to create a
EEG signal. Finally, we present the outcomes of our binary third HRW group, leaving the remaining words to constitute
classification. the third LRW group. Unless otherwise stated, references to
HRWs and LRWs refer to the third group, jointly selected by
GPT-3.5 and GPT-4.
A. LLM Result Analysis
1) GPT-3.5 and GPT-4 Comparison: During our experimen-
tal investigation involving state-of-the-art LLMs, we observed B. Eye-Fixation Statistics
a remarkable level of accuracy when the model was tasked Next, we analyzed the eye activities during the reading
with answering reading comprehension questions from Tasks process. Table II compares the fixation counts and five addi-
1 and 3. Table I compares the performance of different tional eye-fixation features for HRWs and LRWs. We excluded
language models on ZuCo Task 1 and 3 with that of 12 sub- the “VISITED” category from the initial nine categories of
jects. Given LLMs’ generative and non-deterministic nature, relationships, resulting in 7271 words distributed among the
each experimental run produced slightly varying outputs. remaining eight categories after the commonset selection of
ZHANG et al.: INTEGRATING LLM, EEG, AND EYE-TRACKING FOR WORD-LEVEL NEURAL STATE CLASSIFICATION 3471

TABLE II
E YE - FIXATION S TATISTICS

Fig. 3. FRP and PSD Analysis. (a) FRP and PSD Analysis: The left figure displays ERPimages for channels Pz and Oz for both groups (HRW
and LRW). Alongside the ERPimages are the mean FRPs and PSD for both conditions across the channels. Areas of significant difference in
the FRPs are highlighted with shaded regions. (b) Topographic Maps of five Frequency Bands: The right figure presents the average BP for nine
subjects across five frequency bands, excluding three due to incomplete data. This includes topographic maps for HRWs and LRWs in the first and
second rows, respectively, with the third row showing the power differences between the two groups. There is a notable concentration of power in
the occipital scalp regions across all bands, indicative of visual processing involvement.

GPT-3.5 and GPT4. Among these eight categories, LRWs sig- fixation counts between two distinct categories: HRWs and
nificantly outnumbered HRWs by a six-to-one ratio, with 6,109 LRWs. The eye-fixation comparison between no-fixation word
LRWs and 1,162 HRWs. However, there is a large fraction of excluded and included is shown in Fig. 2 for all 12 subjects.
words don’t receive any fixation. Subsequently, we analyzed We undertook this step because words lacking any fixations
the fixation per word metric for the HRW and LRW categories are predominantly associated with the LRW category. Our
for all 12 subjects. Note that the data from three subjects were results show HRWs had an average of slightly more fixations
incomplete for one or two relationships. Table II shows that per word than LRWs, with values of 1.5126 and 1.4026,
HRWs received an average of 1.0584 fixations per word, while respectively. The two comparisons of average fixation, show
LRWs received 0.6576 fixations per word, all when no fixation that subjects spend significantly more time on words that
words included. are highly related to the inference target during reading.
‘In our analysis, we also considered excluding words Importantly, it demonstrates consistency between the results
that received no fixations, followed by comparing average from LLMs and human understanding.
3472 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 32, 2024

We also compared five eye-fixation features, as presented in


the last five columns of Table II. Generally, these features all
measure the duration of a reader’s gaze on a word, capturing
nuances of first-pass reading, regressions and distinguishing
between one or multiple fixations. Among these eye-fixation
features, HRWs exhibited higher values than LRWs for four
out of five metrics, except for SFD. Furthermore, four out of
five features showed statistically significant differences, except
for the GPT.

C. Fixation-Related Potentials
Next we illustrates the FRP analysis for nine subjects.
We excluded three additional subjects because of incomplete
data regarding at least one keyword relationship.
Fig. 3 (a) displays the ERPimage, time-locked to fixation
onsets for HRWs and LRWs for Subject ZAB at PZ and OZ,
accompanied by the mean FRP and power spectral density
(PSD), respectively. The PSD at PZ and OZ for HRWs and
LRWs suggests that the cognitive processing associated with
these words does not significantly alter the power spectral
profile in the observed frequency range. However, there are
slight variations in power at the lower and higher frequencies,
specifically in the [0.5, 10] Hz and [25, 45] Hz ranges. The
FRP analysis at PZ and OZ reveals temporal windows where
the neural response to HRWs differs significantly from that
to LRWs. Notably, the OZ site shows more pronounced
differences, potentially reflecting specialized processing in
the occipital region related to visual aspects and possible
emotional or associative processing of the stimuli [44].
Fig. 3 (b) presents topographic maps representing the aver- Fig. 4. Comparisons of Accuracy by LLMs and Classifiers.
age band power across five frequency bands for nine subjects. (a) Classification Accuracy by Two LLMs: The classification per-
formance, based on Linear SVM, was evaluated using two LLMs
The topographic maps in the first and second rows correspond (GPT-3.5 and GPT-4) and their jointly selected words, alongside four
to HRWs and LRWs, respectively. The third row illustrates feature engineering methods. The EEG feature CondEn demonstrated
the differential BP between HRWs and LRWs. Across all superior performance. A combination of all three EEG features yielded
the highest overall performance, with a marginal enhancement in
frequency bands, there is a notable concentration of power, classification accuracy noted for HRWs co-selected. (b) Comparisons
primarily localized in the occipital scalp regions, particularly Across 11 Classifiers in 12 Subjects: The heatmap comparison high-
within the delta and theta bands. The differences observed in lights variability in the performance of different classifiers across
subjects. Linear SVMs consistently have better accuracies.
the delta and theta bands might indicate increased attentional
and memory-related processes for HRWs, such as top-down
attentional modulation. The alpha suppression suggests active
higher than that of CondEn, with mean accuracies of 60.03%
engagement across conditions, while the beta and gamma
and 59.37% over 12 subjects, respectively. Importantly, all
differences indicate subtle variations in cognitive processing
mean accuracies surpass the chance level. Fig. 4 (b) presents a
[45], [46].
heatmap comparison of 12 subjects’ accuracies using 11 clas-
sifiers with combined features. Although there is variability in
D. Binary Classification Analysis the performance of different classifiers across subjects, Linear
1) Subject-Wise Classification Results: This study assessed SVM and both Medium and Coarse Gaussian SVMs tend to
the viability of using fixation-locked EEG data to detect provide better accuracy.
whether participants looked at HRWs or LRWs. As previ- 2) Classifier Performance Analysis: Table III summarizes
ously mentioned, we determined the relevance labels using the average and standard deviation of classification perfor-
the GPT-3.5 and GPT-4 and reported the highest validation mance among 12 subjects, using four different feature sets
accuracies of eleven classifiers. and eleven machine-learning algorithms. We noted a tangible
Fig. 4 (a) illustrates the classification accuracy of words variation in the accuracy of the classifiers across distinct
labeled by GPT-3.5, GPT-4, and those jointly labeled by methodologies and subjects in the Table III. The Linear SVM
both LLMs, based on Linear SVM. Notably, among the three consistently outperformed other algorithms, exhibiting peak
LLM-based methods for HRWs and LRWs grouping, the joint mean accuracy of 60.03 ± 1.72% in combined features. Using
HRW selection achieved the highest mean accuracy across the second feature set (BP + PCA) resulted in a marginal
all three combined feature methods. This accuracy is slightly decrement in the accuracy of all classifiers, with the highest
ZHANG et al.: INTEGRATING LLM, EEG, AND EYE-TRACKING FOR WORD-LEVEL NEURAL STATE CLASSIFICATION 3473

TABLE III
M EAN ACCURACY ± S TANDARD D EVIATION ACROSS S UBJECTS

recorded at 56.73 ± 1.80% using Medium Gaussian SVM. fixation duration, but also maintained it, evident in the total
In contrast, the third set (CondEn + PCA) enhanced accuracy reading time. Additionally, the shorter single fixation duration
for specific classifiers, with the Linear SVM being paramount, on HRW suggests efficient cognitive processing of these terms,
achieving 59.37 ± 2.05% at its highest. Conversely, employing while a slight increase in go-past time indicates an extra layer
the fourth set (PLV + clustering coefficient) precipitated a of cognitive effort.
universal decline in overall accuracy across all classifiers,
pinpointing 54.70 ± 2.80% for Linear SVM.
B. Fixation-Related EEG Analysis and Classification
V. D ISCUSSION AND C ONCLUSION Unlike traditional BCIs, which relied on precise stimu-
lus presentation as timing markers to extract event-related
This pilot study introduced a novel BCI baseline that
EEG activities such as P300 and Steady-State Visual Evoke
combines LLM-generated labels, particularly from Genera-
Potentials in well-controlled laboratory environments, our
tive Pre-trained Transformers (GPT-3.5 and GPT-4), with an
approach leveraged fixation onsets to capture EEG signals
EEG-based approach for brain state classification and eye-gaze
related to words during natural reading. This implementation
analysis. This is one of the first efforts to use GPT capability
significantly enhances the practicality of BCIs for real-world
for this specialized intersection of cognitive neuroscience and
applications.
artificial intelligence.
Fig. 3 presents EEG data related to natural reading, reveal-
ing subtle yet discernible differences in brain activity in
A. Insights From Eye Gaze During Reading response to words of varying semantic relevance. The ERP
Eye gaze serves as a significant biomarker, holding essen- and PSD data across the Pz and Oz channels suggest that
tial information for understanding the cognitive processes of HRW may elicit slightly different fixation-related potentials
individuals engaged in task-specific reading activities [47]. compared to LRW, as indicated by the shaded areas of the
In this study, we conducted average fixation analyses on graphs. The topographic maps further demonstrate average
three levels: per individual subject, in relation to specific band power across five frequency bands, with the bottom
semantic associations, and at the individual word level. These row highlighting modest differences in power in the occipital
analyses, leveraging data from 12 participants across eight region, suggesting a potential disparity in visual processing.
semantic relations, demonstrate that participants consistently We evaluated the performance of four distinct LLMs to gen-
allocate more time to words with high semantic relevance erate robust labels for improving classification outcomes. Our
(i.e., keywords) during inference tasks, as corroborated by hybrid architecture, combining GPT-3.5 and GPT-4 as word
Appendices A and B. labelers with eye tracking and BCI components, demonstrated
We also scrutinized single-word fixation statistics across robust performance, achieving an accuracy rate exceeding 60%
12 subjects and eight semantic categories within the HRW in the classification of word relevance. This enhancement
and LRW groups. Notably, due to missing data for eight was realized by applying SVMs to three domain-specific
relationship instances — with Subject ZGW omitting “JOB, features: BP, CondEn combined with PCA, and PLV-based
“ZKB missing “WIFE,” and ZPH lacking both “POLITI- graph theory techniques. Each feature was carefully chosen
CAL AFFILIATION” and “WIFE” — we included these for its well-established utility in BCI research and its capacity
gaps in the supplementary materials. Our analyses reveal that to enhance the SNR. Additionally, we explored the pair-wise
HRW elicited significantly higher fixation counts compared to coherence of 5-frequency bands but ultimately decided against
LRW as well, shedding light on participants’ comprehension its use because of its computational complexity, particularly
approaches within the corpus. when considering the 105 EEG channels we employed.
Table II’s eye gaze metrics distinctly show variable engage- The most relevant work to our study is our preliminary
ment with words based on their semantic relevance. The experiment detailed in [21], which used seven naive NLP
elevated fixation counts and prolonged gaze durations for models to determine words ‘similar’ to inference keywords
HRW reinforce the focus on semantically critical terms. These and executed classification using a deep network. However,
terms not only captured initial attention, as reflected in the first this approach encountered significant overfitting after 100-150
3474 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 32, 2024

epochs. The CNN’s test accuracy was only marginally better [5] D. Gunning, M. Stefik, J. Choi, T. Miller, S. Stumpf, and G.-Z. Yang,
than the LDA model, with the highest test accuracy at 59.3% “XAI-explainable artificial intelligence,” Sci. Robot., vol. 4, no. 37,
2019, Art. no. eaay7120.
for cross-subject conditions and 63.3% for within-subject con- [6] M. A. Just and P. A. Carpenter, “A theory of reading: From eye fixations
ditions. In contrast, our current work compares 11 non-deep to comprehension,” Psychol. Rev., vol. 87, no. 4, pp. 329–354, 1980.
learning methods, using 5-fold validation, both enhancing the [7] K. Rayner, “Eye movements in reading and information processing:
robustness of our findings and establishing a new baseline for 20 years of research.,” Psychol. Bull., vol. 124, no. 3, pp. 372–422, 1998.
[8] W. Kintsch, Comprehension: A Paradigm for Cognition. Cambridge,
classifying brain states based on word importance, especially U.K.: Cambridge Univ. Press, 1998.
given the high complexity of word-level EEG classification [9] M. Binz and E. Schulz, “Using cognitive psychology to understand
during natural reading comprehension. GPT-3,” Proc. Nat. Acad. Sci. USA, vol. 120, no. 6, Feb. 2023,
Art. no. e2218523120.
Hollenstein et al. [48] used the same ZuCo dataset for [10] L. Ouyang et al., “Training language models to follow instructions with
EEG cross-subject classification to differentiate between two human feedback,” in Proc. Adv. Neural Inf. Process. Syst., vol. 35, 2022,
reading paradigms: normal reading and task-specific reading. pp. 27730–27744.
However, they applied sentence-level labels for predictions, [11] C. Pandarinath et al., “High performance communication by people with
paralysis using an intracortical brain–computer interface,” eLife, vol. 6,
which diverges from our objective. Duan et al. [49] and Wang Feb. 2017, Art. no. e18554.
and Ji [50] focused on brain-to-text tasks, encoding EEG [12] D. Shawky and A. Badawi, “Towards a personalized learning experience
signals to match word embeddings using language models. using reinforcement learning,” in Machine Learning Paradigms: Theory
and Application, 2019, pp. 169–187.
Our study aims to discern distinct brain states indicated by [13] S. Ge, Z. Zhu, B. Wu, and E. S. McConnell, “Technology-based
EEG biomarkers, whereas theirs primarily translates EEG into cognitive training and rehabilitation interventions for individuals with
words with moderate success. mild cognitive impairment: A systematic review,” BMC Geriatrics,
vol. 18, no. 1, pp. 1–19, Dec. 2018.
[14] M. Kutas and K. D. Federmeier, “Thirty years and counting: Finding
C. Challenges and Future Work meaning in the N400 component of the event-related brain potential
Despite these advances, the study has several limitations. (ERP),” Annu. Rev. Psychol., vol. 62, no. 1, pp. 621–647, Jan. 2011.
This study faces challenges because of the “black box” nature [15] M. Kutas and S. A. Hillyard, “Reading senseless sentences: Brain
potentials reflect semantic incongruity,” Science, vol. 207, no. 4427,
of LLMs, particularly in the context of the non-deterministic pp. 203–205, Jan. 1980.
relation, such as “AWARD,” where certain outputted words [16] W. Kintsch, “The role of knowledge in discourse comprehension: A
appear incongruous. This limitation might affect our findings’ construction-integration model,” Psychol. Rev., vol. 95, no. 2, p. 163,
1988.
generalizability and underscore the need for a quantitative [17] J. S. B. T. Evans, “Dual-processing accounts of reasoning, judgment,
assessment to ensure the accuracy and validity of keyword and social cognition,” Annu. Rev. Psychol., vol. 59, no. 1, pp. 255–278,
identification. Jan. 2008.
[18] P. N. Johnson-Laird, Mental Models: Towards a Cognitive Science
Additionally, contextual complexities often influence of Language, Inference, and Consciousness. Cambridge, MA, USA:
semantic classifications. For example, “gold” acquire distinct Harvard Univ. Press, 1983.
semantic relevance when juxtaposed with terms like “medal.” [19] D. S. McNamara and J. Magliano, “Toward a comprehensive model
The sentences incorporating specific target terms, such as of comprehension,” in Psychology of Learning and Motivation,
vol. 51. Academic, 2009, ch. 9, pp. 297–384, doi: 10.1016/S0079-
“NATIONALITY” or “WIFE,” exhibit a significant disparity 7421(09)51009-2.
in the distribution between HRW and LRW, making them [20] X. Liu and Z. Cao, “Enhance reading comprehension from EEG-based
more deterministic. These discrepancies add complexity to brain–computer interface,” in Advances in Artificial Intelligence (Lecture
Notes in Computer Science), vol. 14471, T. Liu, G. Webb, L. Yue, and
the classification of EEG data and introduce the possibility of D. Wang, Eds. Singapore: Springer, 2024, doi: 10.1007/978-981-99-
contamination within the dataset, especially when the meaning 8388-9_44.
of words is most effectively comprehended within the context [21] Q. Li, Reading Comprehension Analysis and Prediction Based on EEG
and Eye-Tracking Techniques. San Diego, CA, USA: Univ. of California,
of phrases rather than in isolation. San Diego, 2021.
This study underscores the potential for more expansive [22] H. Zeng, C. Yang, G. Dai, F. Qin, J. Zhang, and W. Kong, “EEG clas-
research on elucidating reading-related cognitive behaviors. sification of driver mental states by deep learning,” Cognit. Neurodyn.,
vol. 12, no. 6, pp. 597–606, Dec. 2018.
The promise of integrating LLMs into BCIs also points
[23] R. J. Seitz et al., “Valuating other people’s emotional face expression:
towards future advancements in reading assistance technolo- A combined functional magnetic resonance imaging and electroen-
gies. While acknowledging its limitations and complexities, cephalography study,” Neuroscience, vol. 152, no. 3, pp. 713–722,
our work is an early yet significant contribution, paving the Mar. 2008.
[24] M. Tanaka, A. Ishii, and Y. Watanabe, “Neural effects of mental fatigue
way for more integrated studies to foster a deeper understand- caused by continuous attention load: A magnetoencephalography study,”
ing of the multifaceted interplay between neuroscience and Brain Res., vol. 1561, pp. 60–66, May 2014.
computational linguistics. [25] N. Y. AbdulSabur et al., “Neural correlates and network connectivity
underlying narrative production and comprehension: A combined fMRI
and PET study,” Cortex, vol. 57, pp. 107–127, Aug. 2014.
R EFERENCES
[26] Q. Wang, S. Yang, M. Liu, Z. Cao, and Q. Ma, “An eye-tracking study
[1] H. C. Wang et al., “Scientific discovery in the age of artificial intelli- of website complexity from cognitive load perspective,” Decis. Support
gence,” Nature, vol. 620, no. 7972, pp. 47–60, Aug. 2023. Syst., vol. 62, pp. 1–10, Jun. 2014.
[2] K. Singhal et al., “Large language models encode clinical knowledge,” [27] N. Hollenstein, J. Rotsztejn, M. Troendle, A. Pedroni, C. Zhang, and
Nature, vol. 620, no. 7972, pp. 172–180, 2023. N. Langer, “ZuCo, a simultaneous EEG and eye-tracking resource
[3] M. Abdullah, A. Madain, and Y. Jararweh, “ChatGPT: Fundamentals, for natural sentence reading,” Sci. Data, vol. 5, no. 1, pp. 1–13,
applications and social impacts,” in Proc. 9th Int. Conf. Social Netw. Dec. 2018.
Anal., Manage. Secur. (SNAMS), Nov. 2022, pp. 1–8. [28] H. Brouwer, H. Fitz, and J. Hoeks, “Getting real about semantic
[4] S. Bubeck et al., “Sparks of artificial general intelligence: Early exper- illusions: Rethinking the functional role of the P600 in language
iments with GPT-4,” 2023, arXiv:2303.12712. comprehension,” Brain Res., vol. 1446, pp. 127–143, Mar. 2012.
ZHANG et al.: INTEGRATING LLM, EEG, AND EYE-TRACKING FOR WORD-LEVEL NEURAL STATE CLASSIFICATION 3475

[29] C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and [40] K.-J. Chiang, S. Dong, C.-K. Cheng, and T.-P. Jung, “Using EEG signals
D. McClosky, “The Stanford CoreNLP natural language processing to assess workload during memory retrieval in a real-world scenario,”
toolkit,” in Proc. 52nd Annu. Meeting Assoc. Comput. Linguistics, Syst. J. Neural Eng., vol. 20, no. 3, Jun. 2023, Art. no. 036010.
Demonstrations, 2014, pp. 55–60. [41] F. Lotte et al., “A review of classification algorithms for EEG-based
[30] A. Culotta, A. McCallum, and J. Betz, “Integrating probabilistic extrac- brain–computer interfaces: A 10 year update,” J. Neural Eng., vol. 15,
tion models and data mining to discover relations and patterns in text,” no. 3, Apr. 2018, Art. no. 031005, doi: 10.1088/1741-2552/aab2f2.
in Proc. main Conf. Human Lang. Technol. Conf. North Amer. Chapter [42] V. Lawhern, A. Solon, N. Waytowich, S. M. Gordon, C. Hung, and
Assoc. Comput. Linguistics, 2006, pp. 296–303. B. J. Lance, “EEGNet: A compact convolutional neural network for
[31] T.-P. Jung, S. Makeig, M. Westerfield, J. Townsend, E. Courchesne, and EEG-based brain–computer interfaces,” J. Neural Eng., vol. 15, no. 5,
T. J. Sejnowski, “Analyzing and visualizing single-trial event-related 2018, Art. no. 056013.
potentials,” in Advances in Neural Information Processing Systems, [43] Y. Li, R. Yin, H. Park, Y. Kim, and P. Panda, “Wearable-based human
vol. 11, M. Kearns, S. Solla, and D. Cohn, Eds., Cambridge, MA, USA: activity recognition with spatio-temporal spiking neural networks,” 2022,
MIT Press, 1998. arXiv:2212.02233.
[32] I. Jayarathne, M. Cohen, and S. Amarakeerthi, “Person identifi- [44] V. Wyart and C. Tallon-Baudry, “How ongoing fluctuations in human
cation from EEG using various machine learning techniques with visual cortex predict perceptual awareness: Baseline shift versus decision
inter-hemispheric amplitude ratio,” PLOS one, vol. 15, no. 9, 2020, bias,” J. Neurosci., vol. 29, no. 27, pp. 8715–8725, Jul. 2009.
Art. no. e0238872, doi: 10.1371/journal.pone.0238872. [45] C. Tallon-Baudry, O. Bertrand, M.-A. Hénaff, J. Isnard, and C. Fischer,
[33] M. Rubinov and O. Sporns, “Complex network measures of brain “Attention modulates gamma-band oscillations differently in the human
connectivity: Uses and interpretations,” NeuroImage, vol. 52, no. 3, lateral occipital cortex and fusiform gyrus,” Cerebral Cortex, vol. 15,
pp. 1059–1069, Sep. 2010. no. 5, pp. 654–662, May 2005, doi: 10.1093/cercor/bhh167.
[34] Y. Zhang, Y. Liao, Y. Zhang, and L. Huang, “Emergency braking [46] S. Palva, S. Kulashekhar, M. Hämäläinen, and J. M. Palva, “Localization
intention detect system based on K-order propagation number algorithm: of cortical phase and amplitude dynamics during visual working memory
A network perspective,” Brain Sci., vol. 11, no. 11, p. 1424, Oct. 2021. encoding and retention,” J. Neurosci., vol. 31, no. 13, pp. 5013–5025,
[35] W. Ding, Y. Zhang, and L. Huang, “Using a novel functional brain Mar. 2011.
network approach to locate important nodes for working memory tasks,” [47] S.-C. Chen, H.-C. She, M.-H. Chuang, J.-Y. Wu, J.-L. Tsai, and
Int. J. Environ. Res. Public Health, vol. 19, no. 6, p. 3564, Mar. 2022. T.-P. Jung, “Eye movements predict students’ computer-based assess-
[36] Y. Chen, Y. Zhang, W. Ding, F. Cui, and L. Huang, “Research on ment performance of physics concepts in different presentation
working memory states based on weighted k-order propagation number modalities,” Comput. Educ., vol. 74, pp. 61–72, May 2014.
algorithm: An EEG perspective,” J. Sensors, vol. 2022, pp. 1–10, [48] N. Hollenstein et al., “The ZuCo benchmark on cross-subject reading
Jul. 2022. task classification with EEG and eye-tracking data,” Frontiers Psychol.,
[37] S. Aydore, D. Pantazis, and R. M. Leahy, “A note on the phase locking vol. 13, Jan. 2023, Art. no. 1028824.
value and its properties,” NeuroImage, vol. 74, pp. 231–244, Jul. 2013. [49] Y. Duan, J. Zhou, Z. Wang, Y.-K. Wang, and C.-T. Lin, “DeWave:
[38] A. Fornito, A. Zalesky, and E. Bullmore, Fundamentals of Brain Discrete EEG waves encoding for brain dynamics to text translation,”
Network Analysis. New York, NY, USA: Academic, 2016. 2023, arXiv:2309.14030.
[39] E. Bullmore and O. Sporns, “Complex brain networks: Graph theoretical [50] Z. Wang and H. Ji, “Open vocabulary electroencephalography-to-text
analysis of structural and functional systems,” Nature Rev. Neurosci., decoding and zero-shot sentiment classification,” in Proc. AAAI Conf.
vol. 10, no. 3, pp. 186–198, Mar. 2009. Artif. Intell., 2022, vol. 36, no. 5, pp. 5350–5358.

You might also like