Alljoined - A Dataset For EEG-to-Image Decoding

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Alljoined - A dataset for EEG-to-Image decoding

Jonathan Xu * 1 2 , Bruno Aristimunha * 3 4 , Max Emanuel Feucht * 1 5 ,


Emma Qian †1 , Charles Liu †1 2 , Tazik Shahjahan †1 2 , Martyna Spyra †1 , Steven Zifan Zhang1,6 ,
Nicholas Short1,2 , Jioh Kim1,6 , Paula Perdomo1,6 , Ricky Renfeng Mao1,2 , Yashvir Sabharwal1 ,
Michael Ahedor1 , Moaz Shoura6 , Adrian Nestor6
arXiv:2404.05553v1 [q-bio.NC] 8 Apr 2024

EEG is crucial for real-time monitoring of neural dynamics


Abstract [12, 17, 45]. Additionally, EEG is portable, more accessible
to set up, and much more cost-effective than fMRI, mak-
We present Alljoined, a dataset built specifically for ing it suitable for real-world applications, including brain-
EEG-to-Image decoding. Recognizing that an extensive computer interfaces and clinical diagnostics.
and unbiased sampling of neural responses to visual stim- The development of very large fMRI-to-image datasets
uli is crucial for image reconstruction efforts, we collected has proven foundational for recent breakthroughs in deep-
data from 8 participants looking at 10,000 natural images learning image reconstruction projects. Inspired by the need
each. We have currently gathered 46,080 epochs of brain for such datasets in the EEG domain, we present Alljoined,
responses recorded with a 64-channel EEG headset. The a novel, large-scale dataset covering a wide range of natu-
dataset combines response-based stimulus timing, repeti- ralistic stimuli that allows for robust, generalizable image
tion between blocks and sessions, and diverse image classes reconstruction efforts. Our contributions are as follows:
with the goal of improving signal quality. For transparency, • We propose a stimulus presentation approach that tailors
we also provide data quality scores. We publicly release the trial duration and session and block repetitions to maxi-
dataset and all code at https://linktr.ee/alljoined1. mize the signal-to-noise (SNR) ratio.
• We introduce a diverse dataset of EEG responses to 9k
unique naturalistic images for each of the eight partici-
1. Introduction pants, with 1k additional images shared between partici-
In the fields of cognitive neuroscience and medical imaging, pants.
advancements in deep learning have demonstrated unpar- • We perform qualitative comparisons against current EEG-
alleled precision in decoding brain activity [6, 20, 36–38]. to-image datasets.
Researchers have translated the intricate patterns of brain
activity during various cognitive processes by utilizing neu- 2. Related Work
roimaging modalities, such as functional Magnetic Reso-
2.1. EEG-to-Image Datasets
nance Imaging (fMRI) and electroencephalography (EEG).
In this context, one particular area of interest is image EEG-to-image datasets consist of EEG waveforms recorded
reconstruction, which involves the decoding of neural re- while participants watch visual stimuli, enabling the study
sponses to visual stimuli, offering insights into how the of neural representations in the brain. However, previous
brain encodes and processes visual information [6, 9, 10, research on EEG-based image reconstruction has often re-
24, 35, 37]. lied on datasets exhibiting severe limitations regarding ac-
While fMRI has traditionally been the primary tool for quisition design or generalizability to naturalistic stimuli
image reconstruction due to its excellent spatial resolution, [27, 39, 48].
the slow temporal components severely delimit actual clin- A popular EEG-image dataset is Brain2Image [22],
ical usage. On the other hand, EEG is a medical modal- which consists of evoked responses to a visual stimulus
ity available in everyday clinical contexts with an excel- from distinct image classes. Each block consists of stim-
lent time resolution [33, 39, 40]. As neurons fire at mil- uli corresponding only to a single image class. There are 40
lisecond scales, the high temporal resolution provided by classes, with 50 unique images in each class. This dataset
* equal contribution. † core contribution. 1 AllJoined 2 The University has been criticized for having no train-test separation dur-
of Waterloo 3 Université Paris-Saclay, Inria TAU, CNRS, LISN 4 Federal ing recording, block-specific stimuli patterns, and lack of
University of ABC 5 Vrije Universiteit Amsterdam 6 University of Toronto consistency across different frequency bands. These factors
can incorrectly boost model performance by giving extrane- ipants, measured with an inter-stimulus interval of 300 ms,
ous proxy information about the block rather than the actual which captures important hallmarks in visual processing
image-specific brain responses [3, 28]. Moreover, the lim- while maintaining a high presentation frequency [16, 44].
ited number of classes may encourage models to perform
40-way classification rather than image reconstruction. 2.2. fMRI-Image Datasets
Recent studies achieving impressive reconstruction re- The recent development of large functional magnetic reso-
sults have relied on this dataset [5, 23, 26], which may affect nance imaging (fMRI) datasets has enabled researchers to
the validity of their results. As recommended by [3, 28], the decode and reconstruct images observed by humans with
stimuli within each block in our dataset were chosen ran- unprecedented accuracy.
domly across a variety of natural images, effectively mini-
The Brain, Object, Landscape Dataset (BOLD5000) [8]
mizing the risk for block-class correlations.
contains brain responses from 4 human participants who
The diversity of decoding stimuli further limits cur- viewed 5,254 images depicting natural scenes from the
rent EEG-based image reconstruction datasets. While Scene UNderstanding (SUN) [49], MS-COCO [30], and
Brain2Image consists of 40 classes, several studies utilize ImageNet datasets [11]. Similarly, in the Generic Object
a dataset of visual imagery of characters and objects be- Decoding Dataset (GOD) [21], 1,200 images from the Im-
longing to only 10 different classes, ThoughtViz [46]. Both ageNet database were cropped and shown to 5 participants,
Brain2Image and ThoughtViz fail to represent the continu- resulting in one of the first datasets to establish methods for
ous, diverse quality of naturalistic stimuli. The same lim- decoding generic object categories from brain activity.
itation applies to studies utilizing a severely limited quan- The Natural Scenes Dataset (NSD) [4] consists of the
tity of naturalistic stimuli. Approaches to EEG-based im- brain responses of 8 human participants passively viewing
age reconstruction derived from the ThoughtViz [32, 39], 9,000–10,000 color natural scenes from MS-COCO. This
Brain2Image [5, 23, 26], or other equally selective datasets magnitudes-larger dataset has fueled leaps in reconstruction
[2, 48], may thus suffer from generalizing well to diverse, accuracy seen in recent work like MindEye2 [38]. However,
real-world stimuli. the adaptation of such impressive achievements to real-life
To account for the diverse and continuous nature of nat- contexts is quite limited, as MRI scanners are notoriously
uralistic images, Alljoined consists of 1) 10,000 images expensive and difficult to access.
per participant and 2) that belong to at least one of 80
MS-COCO [30] object categories. Importantly, each MS-
COCO category is broader than a single object class (e.g.
3. Methods and Materials
the things category includes car, skateboard, hat, etc.), and 3.1. Participants
each image can belong to up to 5 classes [29].
There are also existing datasets that include naturalis- We collected data from eight participants (six male, two fe-
tic stimuli, but compromise in other domains. The Mind- male), with an average age of 22 ± 0.64 years, all with
BigData initiative [47] captures a wide sample of images normal or corrected-to-normal vision, right-handed. All
from the ImageNet dataset, but is derived only from a single participants were healthy, with no neurocognitive impair-
individual, limiting the potential of training image recon- ments, except 2 participants who reported a history of men-
struction models that generalize to other individuals. The tal health disorders (e.g. GAD, ADHD). Each participant
THINGS-EEG1 [16] and THINGS-EEG2 [13] datasets were provided informed consent. The Research Ethics Board ap-
acquired using short image presentation times of 50 and 100 proved the procedures as suppressed for double-blind re-
ms, and a stimulus onset asynchrony of 100 and 200 ms. view.
Although the rapid serial visual presentation (RSVP)
3.2. Stimuli
[15] paradigm proposes disentangling the temporal dynam-
ics of visual processing and categorical abstraction of non- We use the same visual stimuli as what was shown in
target stimuli, it is not ideal for capturing cortical image the fMRI Natural Scenes Dataset (NSD) [4], consisting of
processing beyond early visual activity with low noise. We 70, 566 images portraying everyday objects and situations
see that [41] obtained the highest accuracy with their EEG- in their natural context. All NSD images are drawn from
image classifier when focusing on 320-480 ms after stim- the MS-COCO dataset [29], including annotations about
ulus onset, and [34] is able to extract relevant decoding objects and their corresponding category contained in the
features even around 550 ms after stimulus onset. This image. Each image can contain more than one object and
suggests that while it takes 50-120 ms for object recogni- more than one object category. These fine-grained object
tion of a stimulus to register in the visual cortex, a longer categories are further grouped into supercategories, each
stimulus period is beneficial for accuracy on downstream of which comprehensively includes all related categories as
tasks. Alljoined consists of extensive data from n=8 partic- defined subsets.
The current study uses a subset of the first 960 images in sion. The montage was arranged in the International 10-20
the 1000 images shown across all participants in the NSD System, and the electrode offset was kept below 40 mV. We
study. These images are drawn from the shared1000 subset used a 22 inch Dell monitor at a resolution of 1080p/60Hz
of the NSD dataset, which comprises 1000 specially curated to display the visual stimulus. As depicted in Figure 3, the
images that all participants in the original NSD study were monitor was positioned centrally and placed at a distance of
presented with [4]. Within this subset of the NSD dataset, 80 cm to maintain a 3.5° visual angle of stimuli. We avoided
the supercategory person was most represented, occurring larger angles to minimize the occurrence of gaze drift.
in 50.94% of all images, followed by animal (23.54%) and
vehicle (23.33%). The distribution of the supercategories is 3.5. Pre-processing
shown in Figure 1. Regarding the dataset pre-processing, we follow recent
work on the importance of separating the biomarkers from
the central nervous and peripheral systems, as described in
[7], and applied the minimum necessary steps. This dataset
was pre-processed entirely using the MNE-P YTHON li-
brary [14].

Filtering Initially, we applied the band-pass filtering with


a low frequency of 0.5 Hz and a high frequency of 125
Hz with overlap-add finite impulse response filtering, with
range based on [43]. We then apply a notch filter at 60Hz to
eliminate power line noise.
Figure 1. Top 12 most frequently occurring supercategories in our
dataset.
Independent Component Analysis (ICA) Next, we per-
3.3. Procedure formed an ICA decomposition using a FastICA model
[1, 18] to separate non-gaussian biological artifacts noise
Images were displayed to participants over the course of from the signal source. We used a decomposition that re-
multiple one-hour-long sessions. Each session consisted of tained 95% of the variance and excluded ICs corresponding
16 blocks, wherein images in the first 8 unique blocks were to eye blinks on the raw data.
in the second 8 blocks.
The repeated blocks (e.g., blocks 1 & 8, 2 & 9, etc.)
Epoching We segment our data into intervals starting at
contained the same stimuli but in a shuffled order to avoid
−50 ms onset stimulus and ending at the end of each trial
sequence effects. Within each block, 120 images from the
at 600 ms, as the stimulus section describes in Section 3.2.
NSD dataset were presented twice, as well as 24 oddball
We discard the jitter time between trials.
stimuli, amounting to 264 images per block.
Given the within-block and the between-block repeti-
tions of NSD images, each NSD image was presented 4 Artifact correction We use a peak-to-peak threshold for
times to obtain a higher signal-to-noise ratio of the corre- each sensor to identify low-quality trials with the AUTORE -
sponding brain activity. Within each trial, an image (NSD JECT algorithm [19] to automatically determine if a trial
or oddball) was presented for 300 ms, followed by 300 ms should be (i) repaired by interpolation of neighboring sen-
of a black screen; a white fixation cross was visible on the sors or (ii) excluded from further analysis. Autoreject per-
screen throughout the entire trial. forms grid search to determine appropriate values for ρ, the
At the end of each trial, an extra jitter time between 0-50 number of channels to interpolate, and κ, the percentage of
ms was added for randomness. To ensure focus, participants channels that must agree as a fraction of total channels for
were prompted to press the space bar when two consecu- consensus. By looking at the number of erroneous sensors
tive trials contained the same image. These oddball trials per trial, this approach allows correction on a per-trial basis
occurred 24 times within each block; brain activity in re- instead of applying a single global threshold to all trials. A
sponse to oddballs has been discarded from the dataset due mean of 130.75 epochs was dropped per trial, with a stan-
to motion artifacts, EEG repetition suppression, and other dard deviation of 260.44.
issues.
Baseline correction Finally, we re-reference our chan-
3.4. Hardware Setup
nels using an average reference scheme, before applying a
We recorded data using a 64-electrode BioSemi ActiveTwo baseline correction window from −50 ms to 0 ms relative
system, digitized at a rate of 512 Hz with 24-bit A/D conver- to stimulus onset, following [42] recommendation for ERP
Session 1 Session 10

8 Blocks repeated x2

Block 1 Rest Block 2 Rest Block 8 Rest


= 16 Blocks

300ms 300ms 300ms

300ms 300ms 300ms 300ms

120 NSD images x2 


+ 24 oddballs

= 264 images

Trial 1 0-50ms
Trial 2 Trial 264
jitter

Figure 2. Schematic overview of the structure of trials, blocks, and sessions. Each of the 120 block-specific NSD images is presented
twice within each block, and each of the 8 session-specific blocks is presented twice within each session. Each participant performed two
sessions on different days. Each of the 10 sessions thus consists of 960 NSD images repeated four times within and across blocks, totaling
9600 unique NSD images per participant.

Note that this latency aligns with the timing parameters of


our experimental design, which involves a 300ms presen-
tation followed by a 300ms rest period, with an additional
0-50ms jitter. Both participant- and cohort-level activity ex-
hibits a sustained high level of activity up until 500 ms after
stimulus onset, where a consistent dip in activity is observed
for both the single participant as well as the whole cohort.
Topographies of activation additionally reveal a strong
concentration of positive activation at occipital, parietal,
and partly temporal electrodes and a consistently negative
activation at central and frontal areas. This topographical
Figure 3. Experimental setup with monitor 80 cm from participant. distribution was stable across the duration of the ERP and
corresponded well between the single participant and the
cohort. Given the strong peak in activity at the occipital
baseline. The epoch data subtracts the average activation and parietal areas, we further investigated the distribution
during the baseline interval to remove noise from the sig- of ERPs across individual participants and sessions at the
nal. occipital and parietal electrodes, as displayed in figure 5.
While the magnitude of activation differs between partici-
4. Analysis pants, we conclude a by-and-large consistent activation pat-
tern across participants and sessions.
4.1. ERP Analysis
4.2. SNR Analysis
The distribution of event-related potentials (ERPs) across
all 64 channels is displayed for a single session of one par- The Signal-to-Noise Ratio (SNR) serves as a pivotal metric
ticipant and averaged over all participants and sessions in in evaluating the efficacy of our dataset. To ascertain the
Figure 5. We observe a strong consistent rise in activity be- SNR, we employ the Standard Measurement Error (SME)
ginning after 150 ms, with a peak between 250 and 300 ms. as a gauge for noise assessment [31]. The SME is deter-
(a) First session of the fifth participant (b) All sessions across all participants

Figure 4. EEG topographic maps and corresponding signals at all 64 electrodes averaged over a) 3823 events for the fifth participant (left)
and b) across all sessions for all participants (43070 events) in the Alljoined1 dataset (right), highlighting individual and common brain
activity patterns associated with image presentation.

the critical importance of incorporating repeated measures


across sessions or blocks for robust SNR evaluation.
Furthermore, we observe a strong SNR increase 150 ms
after stimulus onset. Note that this increase in SNR exhibits
the same spiking timing we see in our earlier topographic
maps and averaged ERP graphs, suggesting that meaning-
ful activity starts to surface with a considerable delay with
respect to stimulus onset.

4.3. Discussion

Figure 5. ERPs averaged over occipital and parietal electrodes The ERP and topography analyses, as well as our analy-
for all participants and sessions. Shaded areas around the grand sis of the SNR reveal and reinforce several benefits of the
average ERP indicate standard deviations at all timepoints. acquired dataset, with regard to the stimulus design and tim-
ing.
1. Stimulus Duration and Stimulus Onset Asynchrony:
mined by calculating the standard deviation of the aggre- A 300 ms presentation window as well as a subsequent
gated waveform average for each event type across all trials 300 ms rest period allows the capture of both early and
and then dividing this by the square root of the event type’s late cognitive processes, as evidenced by the single sub-
occurrence count. The SNR is subsequently derived by di- ject peaks at around 262 ms up to 479 ms in Figure 6
viding the mean signal values by their corresponding SME. a), and the averaged peaks at 293 ms and 521 ms in Fig-
Figure 6 compares the average SNR across all events in a ure 6 b), respectively. The duration of 300 ms for image
single session for participant 5 with the average SNR across presentation is sufficient for the brain to engage in both
all events for both sessions concatenated. We see that the perceptual encoding and initial stages of memory pro-
SNR is noticeably lower in the multi session graph. This is cessing, which may not be as effectively captured with
due to the increased number of repetitions for a given event shorter presentation times. The subsequent 300ms rest
at different timepoints. This attributes to a disproportion- period provides a window to measure the brain’s higher-
ately higher standard deviation value and consequently a level visual and semantic response to the stimuli. The
higher SME and lower SNR. However, that is not to say that whole ERP thus not only reflects the initial feed-forward
the quality of the data is worse. It actually reflects more ac- transfer of sensory information to visual cortical areas
curate SNR values as there are more data points, distributed but also the subsequent recurrent interactions involved
across different sessions. It is also observed that the single in attention and semantic analysis, that unfold over hun-
sessions graph is more volatile across time, demonstrating a dreds of milliseconds after stimulus onset. The relevance
greater variance in SNR values which are captured by have of longer presentation times and longer stimulus-onset
a less accurate metric for noise with less trials to average asynchrony is additionally supported by the sustained
between. This fluctuation underscores the limited accuracy ERP activation presented in Figure 5, as well as the la-
of noise metrics derived from fewer trials, thus highlighting tency of SNR increase and peak in Figure 6.
Figure 6. Signal to Noise Rate (SNR) averaged across each session, across each block, and within each block for participant 5. Left: SNR
for only the first session 1, Right: SNR for all sessions.

2. Comparison with Prior Studies: Presentation times of 5. Conclusion


only 100 ms, or stimulus onset asynchronies of only 200
ms fail to capture the rich neural dynamics associated We introduce Alljoined1, an EEG-image dataset that uses
with image processing, involving both lower and higher well-timed stimuli, repetitions between blocks and sessions,
level processing. In THINGS EEG2 [13], with a shorter and a wide distribution of natural images to create an im-
100ms presentation time followed by 100ms of rest, the proved dataset for image decoding tasks. We believe that
its size, diversity, and quality will help promote work to
stimulus exposure may have been insufficient to elicit
the full range of cognitive processes to occur. The lim- better understand the mechanisms of visual processing, and
ited time window could explain the lesser degree of neu- in decoding visual responses in clinical and consumer BCI
ral activity in the corresponding time window. Similarly, contexts.
THINGS EEG1 [16] employed a shorter 50ms presenta- Future Directions: We are eager to explore high-density
tion window followed by 50ms rest, which, while suit- EEG recording of exclusively the occipital and parietal re-
able for examining the earliest stages of sensory process- gions to better target regions of the brain most responsive
ing in the visual cortex, likely precluded the phases of to visual stimuli. We are also interested in conducting ab-
the cognitive processes that unfold over a longer period. lation studies on the generalizability of responses to imag-
This includes higher-order mechanisms such as selective ined mental imagery. We also believe there is great potential
attention, working memory updating, and retrieval of se- in exploring continuous data collection in natural environ-
ments with a wireless headset.
mantic associations from long-term memory stores [25].
3. Phase Locking Mitigation: The inclusion of a jitter Data availability: The preprocessed EEG dataset is
ranging from 0-50ms helps mitigate phase locking, a available on OSF. Labels to the corresponding NSD image
phenomenon where the participant’s alpha-wave activ- IDs are included in the object files.
ity becomes synchronously aligned with the pattern of Code availability: The stimulus and preprocessing code
the stimuli after repeated presentations. to reproduce all the results is available on anonymized
GitHub here and here.
4. Anticipatory Bias Minimization: Additionally, the jit-
ter prevents the participants from predicting the exact on-
set of the next stimulus, thus reducing the potential for
6. Acknowledgements
anticipatory neural activity that could confound the data. This work was sponsored by ZFellows. BA work was
supported by DATAIA Convergence Institute as part of
the “Programme d’Investissement d’Avenir”, (ANR-17-
CONV-0003) operated by LISN-CNRS. We would like to
In conclusion, the design parameters of our experiment thank Dr Sylvain Chevallier for his valuable feedback on
not only accommodate the full breadth of neural responses this manuscript.
captured in the critical window of 300ms but also provide
a comparative advantage over the shorter intervals used in References
THINGS EEG2 and THINGS EEG1. The ideal timing of [1] Pierre Ablin, Jean-François Cardoso, and Alexandre Gram-
our experiment ensures the acquisition of a comprehensive fort. Faster ICA under orthogonal constraint. In 2018 IEEE
ERP waveform, contributing to a more nuanced understand- International Conference on Acoustics, Speech and Signal
ing of cognitive processes and neural dynamics. Processing (ICASSP), pages 4464–4468. IEEE, 2018. 3
[2] Hajar Ahmadieh, Farnaz Gassemi, and Mohammad Hasan [15] Tijl Grootswagers, Amanda K Robinson, and Thomas A
Moradi. Visual image reconstruction based on EEG sig- Carlson. The representational dynamics of visual objects
nals using a generative adversarial and deep fuzzy neural in rapid serial visual processing streams. NeuroImage, 188:
network. Biomedical Signal Processing and Control, 87: 668–679, 2019. 2
105497, 2024. 2 [16] Tijl Grootswagers, Ivy Zhou, Amanda K Robinson, Martin N
[3] Hamad Ahmed, Ronnie B. Wilbur, Hari M. Bharadwaj, and Hebart, and Thomas A Carlson. Human EEG recordings for
Jeffrey Mark Siskind. Confounds in the Data—Comments 1,854 concepts presented in rapid serial visual presentation
on “Decoding Brain Representations by Multimodal Learn- streams. Scientific Data, 9(1):3, 2022. 2, 6
ing of Neural Activity and Visual Features”. IEEE Transac- [17] Assaf Harel, Iris IA Groen, Dwight J Kravitz, Leon Y De-
tions on Pattern Analysis and Machine Intelligence, 44(12): ouell, and Chris I Baker. The temporal dynamics of scene
9217–9220, 2022. Conference Name: IEEE Transactions on processing: A multifaceted EEG investigation. Eneuro, 3(5),
Pattern Analysis and Machine Intelligence. 2 2016. 1
[4] Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L [18] Aapo Hyvärinen, Juha Karhunen, and Erkki Oja. Indepen-
Breedlove, Jacob S Prince, Logan T Dowdle, Matthias Nau, dent component analysis, adaptive and learning systems for
Brad Caron, Franco Pestilli, Ian Charest, et al. A massive 7T signal processing, communications, and control. John Wiley
fMRI dataset to bridge cognitive neuroscience and artificial & Sons, Inc, 1:11–14, 2001. 3
intelligence. Nature neuroscience, 25(1):116–126, 2022. 2, [19] Mainak Jas, Denis A Engemann, Yousra Bekhti, Federico
3 Raimondo, and Alexandre Gramfort. Autoreject: Automated
[5] Yunpeng Bai, Xintao Wang, Yan-pei Cao, Yixiao Ge, Chun artifact rejection for MEG and EEG data. NeuroImage, 159:
Yuan, and Ying Shan. Dreamdiffusion: Generating high- 417–429, 2017. 3
quality images from brain eeg signals. arXiv preprint [20] Vinay Jayaram and Alexandre Barachant. MOABB: trust-
arXiv:2306.16934, 2023. 2 worthy algorithm benchmarking for BCIs. Journal of neural
[6] Yohann Benchetrit, Hubert Banville, and Jean-Remi King. engineering, 15(6):066011, 2018. 1
Brain decoding: toward real-time reconstruction of visual [21] Tomoyasu Horikawa & Yukiyasu Kamitani. Generic decod-
perception. In The Twelfth International Conference on ing of seen and imagined objects using hierarchical visual
Learning Representations, 2024. 1 features. Nature Communications, 2017. 2
[7] Philipp Bomatter, Joseph Paillard, Pilar Garces, Jörg Hipp, [22] Isaak Kavasidis, Simone Palazzo, Concetto Spampinato,
and Denis Engemann. Machine learning of brain-specific Daniela Giordano, and Mubarak Shah. Brain2Image: Con-
biomarkers from EEG. bioRxiv, 2024. 3 verting Brain Signals into Images. In Proceedings of the 25th
ACM international conference on Multimedia, pages 1809–
[8] Nadine Chang, John A Pyles, Austin Marcus, Abhinav
1817, Mountain View California USA, 2017. ACM. 1
Gupta, Michael J Tarr, and Elissa M Aminoff. BOLD5000, a
[23] Nastaran Khaleghi, Tohid Yousefi Rezaii, Soosan Beheshti,
public fMRI dataset while viewing 5000 visual images. Sci-
Saeed Meshgini, Sobhan Sheykhivand, and Sebelan Danish-
entific data, 6(1):49, 2019. 2
var. Visual Saliency and Image Reconstruction from EEG
[9] Zijiao Chen, Jonathan Xu, Jiaxin Qing, Ruilin Li, and
Signals via an Effective Geometric Deep Network-Based
Juan Helen Zhou. Structure-Preserved Image Reconstruc-
Generative Adversarial Network. Electronics, 11(21):3637,
tion from Brain Recordings. In preparation, 2023. 1
2022. Number: 21 Publisher: Multidisciplinary Digital Pub-
[10] Zijiao Chen, Jiaxin Qing, and Juan Helen Zhou. Cinematic lishing Institute. 2
mindscapes: High-quality video reconstruction from brain [24] Jean-Rémi King, Laura Gwilliams, Chris Holdgraf, Jona
activity. Advances in Neural Information Processing Sys- Sassenhagen, Alexandre Barachant, Denis Engemann, Eric
tems, 36, 2024. 1 Larson, and Alexandre Gramfort. Encoding and Decoding
[11] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Framework to Uncover the Algorithms of Cognition. In The
and Li Fei-Fei. Imagenet: A large-scale hierarchical image Cognitive Neurosciences. The MIT Press, 2020. 1
database. In 2009 IEEE conference on computer vision and [25] Yixuan Ku. Selective attention on representations in working
pattern recognition, pages 248–255. Ieee, 2009. 2 memory: cognitive and neural mechanisms. PeerJ, 6:e4585,
[12] Nadine Dijkstra, Pim Mostert, Floris P de Lange, Sander 2018. 6
Bosch, and Marcel AJ van Gerven. Differential temporal [26] Yu-Ting Lan, Kan Ren, Yansen Wang, Wei-Long Zheng,
dynamics during visual imagery and perception. Elife, 7: Dongsheng Li, Bao-Liang Lu, and Lili Qiu. Seeing through
e33904, 2018. 1 the Brain: Image Reconstruction of Visual Perception from
[13] Alessandro T. Gifford, Kshitij Dwivedi, Gemma Roig, and Human Brain Signals, 2023. arXiv:2308.02510 [cs, eess, q-
Radoslaw M. Cichy. A large and rich EEG dataset for mod- bio]. 2
eling human visual object recognition. NeuroImage, 264: [27] Lynn Le, Luca Ambrogioni, Katja Seeliger, Yağmur
119754, 2022. 2, 6 Güçlütürk, Marcel Van Gerven, and Umut Güçlü. Brain2pix:
[14] Alexandre Gramfort, Martin Luessi, Eric Larson, Denis A Fully convolutional naturalistic video reconstruction from
Engemann, Daniel Strohmeier, Christian Brodbeck, Roman brain activity. BioRxiv, pages 2021–02, 2021. 1
Goj, Mainak Jas, Teon Brooks, Lauri Parkkonen, et al. MEG [28] Ren Li, Jared S. Johansen, Hamad Ahmed, Thomas V.
and EEG data analysis with MNE-Python. Frontiers in neu- Ilyevsky, Ronnie B. Wilbur, Hari M. Bharadwaj, and Jef-
roscience, 7:70133, 2013. 3 frey Mark Siskind. Training on the test set? An analysis of
Spampinato et al. [31], 2018. arXiv:1812.07697 [cs, q-bio]. tion from EEG Brain Signals, 2023. arXiv:2302.10121 [cs,
2 q-bio]. 1, 2
[29] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, [40] Prajwal Singh, Dwip Dalal, Gautam Vashishtha, Krishna
Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Miyapuram, and Shanmuganathan Raman. Learning Robust
Zitnick. Microsoft coco: Common objects in context. In Deep Visual Representations from EEG Brain Recordings.
Computer Vision–ECCV 2014: 13th European Conference, In Proceedings of the IEEE/CVF Winter Conference on Ap-
Zurich, Switzerland, September 6-12, 2014, Proceedings, plications of Computer Vision, pages 7553–7562, 2024. 1
Part V 13, pages 740–755. Springer, 2014. 2 [41] Concetto Spampinato, Simone Palazzo, Isaak Kavasidis,
[30] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Daniela Giordano, Nasim Souly, and Mubarak Shah. Deep
Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence learning human mind for automated visual classification. In
Zitnick. Microsoft coco: Common objects in context. In Proceedings of the IEEE conference on computer vision and
Computer Vision–ECCV 2014: 13th European Conference, pattern recognition, pages 6809–6817, 2017. 2
Zurich, Switzerland, September 6-12, 2014, Proceedings, [42] Darren Tanner, James JS Norton, Kara Morgan-Short, and
Part V 13, pages 740–755. Springer, 2014. 2 Steven J Luck. On high-pass filter artifacts (they’re real) and
[31] Steven J Luck, Andrew X Stewart, Aaron Matthew Sim- baseline correction (it’sa good idea) in ERP/ERMF analysis.
mons, and Mijke Rhemtulla. Standardized measurement er- Journal of neuroscience methods, 266:166–170, 2016. 3
ror: A universal metric of data quality for averaged event- [43] Yunzhe Tao, Tao Sun, Aashiq Muhamed, Sahika Genc, Dy-
related potentials. Psychophysiology, 58(6):e13793, 2021. lan Jackson, Ali Arsanjani, Suri Yaddanapudi, Liang Li, and
4 Prachi Kumar. Gated transformer for decoding human brain
[32] Rahul Mishra, Krishan Sharma, R. R. Jha, and Arnav EEG signals. In 2021 43rd Annual International Confer-
Bhavsar. NeuroGAN: image reconstruction from EEG sig- ence of the IEEE Engineering in Medicine & Biology Society
nals via an attention-based GAN. Neural Computing and (EMBC), pages 125–130. IEEE, 2021. 3
Applications, 35(12):9181–9192, 2023. 2 [44] Lina Teichmann, Martin N Hebart, and Chris I Baker. Mul-
[33] Dan Nemrodov, Matthias Niemeier, Ashutosh Patel, and tidimensional object properties are dynamically represented
Adrian Nestor. The neural dynamics of facial identity pro- in the human brain. bioRxiv, 2023. 2
cessing: insights from EEG-based pattern analysis and im- [45] Simon Thorpe, Denis Fize, and Catherine Marlot. Speed of
age reconstruction. Eneuro, 5(1), 2018. 1 processing in the human visual system. nature, 381(6582):
[34] Dan Nemrodov, Shouyu Ling, Ilya Nudnou, Tyler Roberts, 520–522, 1996. 1
Jonathan S. Cant, Andy C. H. Lee, and Adrian Nestor. A
[46] Praveen Tirupattur, Yogesh Singh Rawat, Concetto Spamp-
multivariate investigation of visual word, face, and ensem-
inato, and Mubarak Shah. ThoughtViz: Visualizing Hu-
ble processing: Perspectives from EEG-based decoding and
man Thoughts Using Generative Adversarial Network. In
feature selection. Psychophysiology, 57(3):e13511, 2020. 2
Proceedings of the 26th ACM international conference on
[35] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Multimedia, pages 950–958, Seoul Republic of Korea, 2018.
Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, ACM. 2
Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning
[47] David Vivancos and Felix Cuesta. MindBigData 2022
transferable visual models from natural language supervi-
A Large Dataset of Brain Signals. arXiv preprint
sion. In International conference on machine learning, pages
arXiv:2212.14746, 2022. 2
8748–8763. PMLR, 2021. 1
[48] Suguru Wakita, Taiki Orima, and Isamu Motoyoshi. Photore-
[36] Yannick Roy, Hubert Banville, Isabela Albuquerque,
alistic Reconstruction of Visual Texture From EEG Signals.
Alexandre Gramfort, Tiago H Falk, and Jocelyn Faubert.
Frontiers in Computational Neuroscience, 15, 2021. 1, 2
Deep learning-based electroencephalography analysis: a
systematic review. Journal of Neural Engineering, 16(5): [49] Daniel LK Yamins and James J DiCarlo. Using goal-driven
051001, 2019. 1 deep learning models to understand sensory cortex. Nature
neuroscience, 19(3):356–365, 2016. 2
[37] Paul Steven Scotti, Atmadeep Banerjee, Jimmie Goode,
Stepan Shabalin, Alex Nguyen, Cohen Ethan, Aidan James
Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg,
Kenneth Norman, and Tanishq Mathew Abraham. Recon-
structing the Mind’s Eye: fMRI-to-Image with Contrastive
Learning and Diffusion Priors. In Thirty-seventh Conference
on Neural Information Processing Systems, 2023. 1
[38] Paul S Scotti, Mihir Tripathy, Cesar Kadir Torrico Vil-
lanueva, Reese Kneeland, Tong Chen, Ashutosh Narang,
Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris,
Kenneth A Norman, et al. MindEye2: Shared-Subject Mod-
els Enable fMRI-To-Image With 1 Hour of Data. arXiv
preprint arXiv:2403.11207, 2024. 1, 2
[39] Prajwal Singh, Pankaj Pandey, Krishna Miyapuram, and
Shanmuganathan Raman. EEG2IMAGE: Image Reconstruc-

You might also like