EEG2IMAGE Image Reconstruction From EEG Brain Signals
EEG2IMAGE Image Reconstruction From EEG Brain Signals
ABSTRACT EEG signals have been widely employed for classifying several dis-
orders or understanding brain dynamics. The successful implica-
Reconstructing images using brain signals of imagined visuals tion of the past results can be seen in the BCI. A previous effort
may provide an augmented vision to the disabled, leading to the made by the research community has shown promising results to
advancement of Brain-Computer Interface (BCI) technology. The augment healthy individuals with additional sensory or motor ca-
recent progress in deep learning has boosted the study area of syn- pabilities [3]. The most intriguing task is to decode the content of
thesizing images from brain signals using Generative Adversarial the mind using brain signals and draw a link between them. Two
Networks (GAN). In this work, we have proposed a framework for most challenging endeavors in this space are to reconstruct the vi-
synthesizing the images from the brain activity recorded by an elec- sualized images [4] and decode imagined speech-to-text [5] based
troencephalogram (EEG) using small-size EEG datasets. This brain solely on recorded brain signals. Vision neuroscientists made the
activity is recorded from the subject’s head scalp using EEG when initial attempts [6, 7, 8] to provide an evidence of visual stimuli fea-
they ask to visualize certain classes of Objects and English charac- tures represented in recorded brain activity. These attempts initiated
ters. We use a contrastive learning method in the proposed frame- the classification of image categories using brain signals using deep
work to extract features from EEG signals and synthesize the images learning and further led to the reconstruction and generation of the
from extracted features using conditional GAN. We modify the loss images [9].
function to train the GAN, which enables it to synthesize 128 × 128 Our contributions are as follows: 1) A framework that can syn-
images using a small number of images. Further, we conduct abla- thesize images using a small EEG dataset, 2) Use of semi-hard triplet
tion studies and experiments to show the effectiveness of our pro- loss [10] to learn features from EEG signals that show better k-means
posed framework over other state-of-the-art methods using the small accuracy than the softmax counterpart, as shown in Figs. [2,3] and
EEG dataset. 3) Use of mode seeking regularization [11] and data augmentation
Index Terms— Deep Learning, EEG, GAN [12] based modification to GAN for synthesizing high-quality im-
ages using conditional GAN as shown in Fig. 4(b).1
Human visual system is considered a highly advanced intelligent The development of advanced deep generative architectures in recent
information processor that generates rich 3D visuals with semantic times has made it possible to see images from brain signals. The
construction. The most challenging problem is to train artificial ma- initial study by Kavasidis et al. implemented long short-term mem-
chines to construct images from brain activity directly to semantic ory (LSTM) stacked with generative techniques to generate seen im-
categories [1]. The possibility of brain-to-image construction signif- ages from 40 Image Net classes [4]. Thoughtviz [13] encouraged
icantly contributes to advancing the Brain-Computer Interface (BCI) the design of conditional GAN (cGAN) to decode EEG signals us-
technology. The core purpose of BCI using invasive or non-invasive ing a small dataset consisting of imagination tasks comprising dig-
techniques is to provide communication and control of external de- its, characters, and objects. Several architectures have been devel-
vices by thought alone or using minimal muscular activity. This ef- oped using CNN and LSTM on the time-series data of most bio-
fort would be highly relevant in neuro-rehabilitation, i.e. to support logical areas. The capability of LSTM for identifying the sequen-
patients with disabilities to have better everyday communication in tial pattern and CNN to locate the neighborhood features was re-
their lives. Decoding brain responses to imagination/visual stimuli cently combined with spectral normalization generative adversarial
would greatly benefit the communication exchange for the disabled network (SNGAN) to yield seen images from EEG encodings [14].
people. The most widely employed brain imaging modality with Researchers are putting effort into reconstructing geometrical shapes
high temporal precision is Electroencephalography (EEG) due to its from brain activities, primarily in generating precise edges and other
relatively lower cost and portability. low-level details. Further advancement in GAN leads to synthesiz-
EEG is a non-invasive technique which makes it the most practi- ing natural geometrical shapes, which enforces semantic alignment
cal methodology to record the electrophysiological dynamics of the constraints to construct natural shapes at pixel-level [15, 16]. Re-
brain. EEG signals have been used to analyze a wide spectrum of re- cently, a siamese network was utilized to maximize the relationship
search from aspects of cognition to clinical aspects [2]. For decades, between extracted manifold brain feature representation and visual
features [17]. The obtained representation demonstrated better im-
This work is supported by Prime Minister Research Fellowship (PMRF- age classification and saliency detection performance on the learned
2122-2557) to PS and thanks to SERB and PlayPowerLabs for PMRF to PP.
We thank FICCI for facilitating the PMRF of PP. 1 https://github.com/prajwalsingh/EEG2Image
Authorized licensed use limited to: Universitaet Linz. Downloaded on November 01,2023 at 14:24:56 UTC from IEEE Xplore. Restrictions apply.
EEG Feature
Real Differentiable Real
Data
Augmentation
Noise
Discriminator
Generator
Real/Fake
LSTM
LSTM
EEG Feature
Fake Fake
EEG EEG
Feature
Fig. 1. This figure illustrates the proposed framework for EEG feature extraction and image generation. a) Shows the LSTM network with
128 hidden units that transforms EEG signal into 128D feature vector. b) Shows the GAN network with a data augmentation block that
prevents the discriminator from memorizing the small dataset and helps the generator synthesize high-quality images.
Fig. 2. t-SNE [20] visualization of Object test dataset [21] EEG Fig. 3. t-SNE [20] visualization of Object test dataset [21] EEG fea-
feature space which is learned using label supervision with test clas- ture space which is learned using triplet loss with test k-means ac-
sification accuracy 0.75 and k-means accuracy 0.13. curacy 0.25. Each cluster’s equivalent EEG-based generated images
are also visualized in this plot.
manifold. Khare et al. proposed conditional progressive growing
of GANs (CProGAN) to develop perceived images [18] and showed the two data with different labels. To prevent the feature extraction
higher inception than previous related work. Recent work on con- network from squashing the representation of each data into a small
trastive self-supervised approach has been shown to maximize the cluster, a margin term is used in triplet loss. It ensures that the dis-
mutual information between visual stimulus and corresponding EEG tance between the feature of the same label data is close to zero and
latent representations [19]. They proposed an approach that em- greater than the margin for different label data. The formulation of
ployed cross-modal alignment enforcing image retrieval at the in- triplet loss is as follows:
stance level rather than pixel-level generation.
min E ||fθ (xa ) − fθ (xp )||22 − ||fθ (xa ) − fθ (xn )||22 + β
(1)
θ
3. PROPOSED METHOD
where, f is parameterized function on θ that maps EEG signals to
In this work, we proposed a framework shown in Fig.1, for visual- a feature space i.e. fθ : RC×T − → R128 . The goal of Eqn. 1 is to
izing the brain activity EEG signals. The framework consists of a minimize the distance between anchor (a) EEG signal and positive
two-phase approach: 1) extracting good features from the EEG sig- (p) EEG signal of the same class as the anchor and maximize the
nals with a contrastive learning approach and 2) a conditional data- distance between anchor EEG signal and negative (n) EEG signal of
efficient GAN to transform the extracted EEG features to image. In different class with margin distance. This formulation is also known
our case, a good feature implies useful information about an image as metric learning or contrastive learning. The idea behind using
that can help GAN to reconstruct that image. the formulation is to ensure that the EEG signals generated by the
Feature Extraction. Recent works [17] have shown that the brain activity for similar images should be close to each other in the
contrastive learning-based approach outperforms the supervised set- learned feature space [22]. For learning better features, we have used
ting in the case of generalized feature learning for downstream tasks semi-hard hard triplets, where the distance of the negative sample is
such as object detection, classification, saliency map from EEG sig- more than positive but less than the margin, and also used an online
nals, etc. Building on this, we have used a triplet loss-based con- hard-triplet mining strategy similar to [10].
trastive learning [10] approach in the proposed framework for EEG Image Generation. In the proposed framework, we have used
feature learning. Triplet loss aims to minimize the distance between a Generative Adversarial Network (GAN) [23] to synthesize the im-
the two data with the same labels and maximize the distance between age from the extracted EEG feature. A GAN architecture consists
Authorized licensed use limited to: Universitaet Linz. Downloaded on November 01,2023 at 14:24:56 UTC from IEEE Xplore. Restrictions apply.
(a) ThoughtViz [13] (b) EEG2Image (Ours)
Table 1. Comparison of Inception Score values (on all classes of Training a GAN for synthesizing photorealistic images requires
Object dataset [21]). a large number of data [12], and other deep learning approaches also
face the same data scarcity issues. Zhao et al. [12] in their work have
of two sub-networks: Generator (G) and Discriminator (D). The shown that the problem of sparse data for training a GAN can be
purpose of a Generator is to learn the transformation between a la- resolved by adding a Differentiable Data Augmentation (DiffAug)
tent distribution (pZ ) and real-world data distribution (pdata ). In our block between the generator and discriminator, which is illustrated
case, we assume latent distribution as an isotropic Gaussian N (0, I) in Fig.1(b). The issue with sparse data is discriminator can easily
from which we sample a noise vector z ∈ R128 . The discrim- memorize the data, which causes the vanishing gradient problem for
inator learns to distinguish real images from synthesized images. the generator. The data augmentations we have used for our GAN
The complete GAN architecture is trained in a min-max optimiza- network are translation and color jittering. The final loss term we
tion setting. Where the discriminator tries to maximize the score aim to optimize for the proposed EEG2Image is given below:
for real images D(x) and minimize the score for generated images
LD = E(x,ψ)∼pdata (x) [max(0, 1 − D(T (x), ψ))] +
D(G(z)), in contrast to the discriminator, the generator tries to min-
imize (1 − D(G(z)) and the minimizing of the term is only possible Ex∼pZ(z) ,ψ∼pdata (x) [max(0, 1 + D(T (G(z, ψ)), ψ))]
if generator synthesizes photorealistic images. The complete GAN (5)
optimization process can also be represented below: −1
dI (G(ψ, z1 ), G(ψ, z2 ))
Lms = min (6)
min max V (D, G) = Ex∼pdata (x) [log(D(x))]+ G dz (z1 , z2 )
G D
LG = −Ex∼pZ(z) ,ψ∼pdata (x) [D(T (G(z, ψ)), ψ)] + α ∗ Lms
Ex∼pZ(z) [log(1 − D(G(z))))] (2)
(7)
Similar to [13], we aim to develop a framework that can utilize where LD is discriminator loss, LG is generator loss, Lms is mode
a small-size EEG dataset for generating images from EEG signals. seeking regularizer term [11], T is DiffAugment [12] function, ψ is
To overcome the problem of small dataset [13] has used the train- EEG feature vector and α is regularizer weight term which kept as
able weighted Gaussian layer [25], which learns the mean (µ) and 1.0 for all the experiments.
variance (σ) for the encoded EEG signal. In this work, we follow
a different strategy than [13]. Instead, we have used a Conditional 4. EXPERIMENTS AND RESULTS
DCGAN [26] architecture with the following modification 1) fol-
lowing the work of [27], we have used hinge loss for stable train- In the first part of this section, we will discuss the experimental setup
ing of GAN, 2) we have added a differentiable data augmentation we used to train the feature extraction and generative network, in-
block between generator and discriminator which helps the network cluding the dataset. Later in this section, we will discuss all the
in learning from small datasize [12], and 3) to ensure the mode di- ablation studies done to justify choices for the proposed framework.
Authorized licensed use limited to: Universitaet Linz. Downloaded on November 01,2023 at 14:24:56 UTC from IEEE Xplore. Restrictions apply.
(a) no modeloss and dataaug, (b) with modeloss and no dataaug, (c) no modeloss and with dataaug,
inception score 3.61. inception score 4.27. inception score 6.5.
Fig. 6. Ablation study showing the qualitative result on Object dataset [21] using different loss combinations for training the GAN network.
Apple Car Dog Gold Mobile Rose Scooter Tiger Wallet Watch
Object Class All
(n07739125) (n02958343) (n02084071) (n03445326) (n02992529) (n12620196) (n03791053) (n02129604) (n04548362) (n04555897)
Mean 6.09 6.15 6.99 6.98 7.33 5.44 5.81 5.67 6.48 6.67 6.78
SD 0.05 0.084 0.031 0.082 0.030 0.089 0.077 0.057 0.086 0.037 0.086
Table 2. Mean and standard deviation (SD) of Inception scores for each class of Objects dataset [21].
Datasets. We have used the EEG data from [21]. This dataset EEG feature extractor for our proposed framework.
consists of EEG signals for 3 different subjects: Digits, Characters, Feature2Image. The second stage of our proposed framework
and Objects. In our study, we have only used Characters and Ob- is to synthesize photorealistic images from extracted EEG features
ject data because these are more diverse and complex data to show using the first stage. For synthesizing the image, we have used Con-
the effectiveness of the proposed framework. The Characters dataset ditional DCGAN [26] with modification as discussed in Sec. 3. We
consists of ten English alphabet classes and the subset of Chars74K have used Inception Score (IS) [33] as a metric for image quality
[28]. Similarly, the Objects dataset consists of ten different object comparison with other methods. Table 1 shows our proposed GAN
classes and the subset of ImageNet [29]. While collecting brain ac- method performed better in synthesizing the images from less num-
tivity EEG signals of the participants, they were asked to think about ber of EEG data. In Table 2 we have shown per class inception score
one of these characters/objects at a time. To record the EEG signals, for test data of Object dataset [21]. We also performed the qualitative
Emotiv EPOC+ [30] device is used, which has 14 channels with a analysis of synthesized images for both the dataset Object and Char-
sampling rate of 128 Hz per channel. For each dataset, 23 partici- acter, which are shown in Figs. [4, 5]. We have performed several
pants were asked to visualize every ten classes. Thus we have 230 ablation studies to verify the importance of each loss in training the
EEG samples per dataset. For our work, we have used the EEG data GAN network for the proposed framework. For that, we trained the
provided by the authors with train-test splits [13]. We would like to proposed conditional GAN (cGAN) for three different regimes on
thank the authors for making it publicly available. the Object dataset [21]. In the first regime, we train the cGAN with-
EEG2Feature. The first stage of our proposed framework is to out mode seeking regularization and DiffAugment, shown in Fig.
convert EEG signals into useful features for image generation. For 6(a). which has an inception score of 3.61. In the second regime, we
this, we design two regimes. In the first regime, we train a classi- have added a mode seeking regularization term only and trained the
fication network for extracting EEG features as done in [13]. The cGAN from scratch, Fig. 6(b) shows improvement with an inception
classifier is a LSTM [31] network with 128 hidden units using soft- score of 4.27. In the third and last regime, we train the cGAN with
max cross-entropy loss. We use k-means clustering [32] as a metric the DiffAugment block, showing a large improvement in the syn-
for the learned EEG feature, i.e., higher k-means accuracy implies thesized image as shown in Fig. 6(c) with an inception score of 6.5.
better learned representation [22]. The first regime gives us 74.3% Based on these experiments, we used both the mode-seeking regular-
& 75.4% classification accuracy on test data of Object dataset & ization term and the DiffAugment block in the proposed framework.
Character dataset [21] and k-means accuracy of 12.6% and 11.3%,
5. CONCLUSION
further we plot t-SNE map [20] to visualize the clustering of test
data features from Object dataset in Fig. 2. For the second regime, This work proposes a framework that uses a small-sized dataset for
we used a contrastive learning approach to learning the feature of an generating images from brain activity EEG signals. Our proposed
EEG signal. As discussed in the Sec. 3 we used semi-hard triplet framework has a better inception score than the previously proposed
loss for training the LSTM [31] network with 128 hidden units. The method for the small-sized EEG dataset and synthesized images of
goal of triplet loss is to structured the feature space in such a way size 128 × 128. The framework consists of a contrastive learning ap-
that positive pairs are in close proximity to each other while nega- proach to learn the good features of EEG data, which is empirically
tive pairs are positioned far apart. The k-means accuracy we got on shown to perform better than the softmax-based supervised learn-
the test data of the Object dataset is 25%, and the Character dataset ing method. We have performed several ablation studies to demon-
is 20%. Further, we plot t-SNE map [20] to visualize the clustering strate the effectiveness of modified GAN loss function in synthesiz-
of test data features from Object dataset Fig. 3. We can see that the ing high-quality images. As future work, we plan to tackle large-size
k-means accuracy and t-SNE plot are better for the second regime. EEG datasets and approach for complete self/un-supervised learning
Therefore we decided to use the contrastive learning method as an for extracting features from EEG data and image synthesis.
Authorized licensed use limited to: Universitaet Linz. Downloaded on November 01,2023 at 14:24:56 UTC from IEEE Xplore. Restrictions apply.
6. REFERENCES [16] Ahmed Fares, Sheng-hua Zhong, and Jianmin Jiang, “Brain-
media: A dual conditioned and lateralization supported gan
[1] Yoichi Miyawaki, Hajime Uchida, Okito Yamashita, Masa-aki (dcls-gan) towards visualization of image-evoked brain activi-
Sato, Yusuke Morito, Hiroki C Tanabe, Norihiro Sadato, and ties,” in 28th ACM MM, 2020, pp. 1764–1772.
Yukiyasu Kamitani, “Visual image reconstruction from human
[17] Simone Palazzo, Concetto Spampinato, Isaak Kavasidis,
brain activity using a combination of multiscale local image
Daniela Giordano, Joseph Schmidt, and Mubarak Shah, “De-
decoders,” Neuron, vol. 60, no. 5, pp. 915–929, 2008.
coding brain representations by multimodal learning of neural
[2] Vangelis Sakkalis, “Applied strategies towards eeg/meg activity and visual features,” IEEE PAMI, vol. 43, no. 11, pp.
biomarker identification in clinical and cognitive research,” 3833–3849, 2020.
Biomarkers in medicine, vol. 5, no. 1, pp. 93–105, 2011. [18] Sanchita Khare, Rajiv Nayan Choubey, Loveleen Amar, and
[3] Mahdi Bamdad, Homayoon Zarshenas, and Mohammad A Venkanna Udutalapalli, “Neurovision: perceived image regen-
Auais, “Application of bci systems in neurorehabilitation: a eration using cprogan,” Neural Computing and Applications,
scoping review,” Disability and Rehabilitation: Assistive Tech- vol. 34, no. 8, pp. 5979–5991, 2022.
nology, vol. 10, no. 5, pp. 355–364, 2015. [19] Zesheng Ye, Lina Yao, Yu Zhang, and Silvia Gustin, “See what
[4] Isaak Kavasidis, Simone Palazzo, Concetto Spampinato, you see: Self-supervised cross-modal retrieval of visual stimuli
Daniela Giordano, and Mubarak Shah, “Brain2image: Con- from brain activity,” arXiv preprint arXiv:2208.03666, 2022.
verting brain signals into images,” in 25th ACM MM, 2017. [20] Laurens Van der Maaten and Geoffrey Hinton, “Visualizing
[5] Zhenhailong Wang and Heng Ji, “Open vocabulary data using t-sne.,” JMLR, vol. 9, no. 11, 2008.
electroencephalography-to-text decoding and zero-shot senti- [21] Pradeep Kumar, Rajkumar Saini, Partha Pratim Roy,
ment classification,” in AAAI, 2022, vol. 36, pp. 5350–5358. Pawan Kumar Sahu, and Debi Prosad Dogra, “Envisioned
[6] Thomas Carlson, David A Tovar, Arjen Alink, and Nikolaus speech recognition using eeg sensors,” Personal and Ubiq-
Kriegeskorte, “Representational dynamics of object vision: the uitous Computing, vol. 22, no. 1, pp. 185–199, Feb 2018.
first 1000 ms,” Journal of vision, vol. 13, no. 10, pp. 1–1, 2013. [22] Yaling Tao, Kentaro Takagi, and Kouta Nakata, “Clustering-
[7] Thomas A Carlson, Hinze Hogendoorn, Ryota Kanai, Juraj friendly representation learning via instance discrimination
Mesik, and Jeremy Turret, “High temporal resolution decoding and feature decorrelation,” preprint arXiv:2106.00131, 2021.
of object position and category,” Journal of vision, vol. 11, no. [23] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing
10, pp. 9–9, 2011. Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
[8] Koel Das, Barry Giesbrecht, and Miguel P Eckstein, “Pre- Yoshua Bengio, “Generative adversarial networks,” Commu-
dicting variations of perceptual performance across individu- nications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
als from neural activity using pattern classifiers,” Neuroimage, [24] Augustus Odena, Christopher Olah, and Jonathon Shlens,
vol. 51, no. 4, pp. 1425–1437, 2010. “Conditional image synthesis with auxiliary classifier gans,”
[9] Concetto Spampinato, Simone Palazzo, Isaak Kavasidis, in ICML. PMLR, 2017, pp. 2642–2651.
Daniela Giordano, Nasim Souly, and Mubarak Shah, “Deep [25] Swaminathan Gurumurthy, Ravi Kiran Sarvadevabhatla, and
learning human mind for automated visual classification,” in R Venkatesh Babu, “Deligan: Generative adversarial networks
IEEE CVPR, 2017, pp. 6809–6817. for diverse and limited data,” in IEEE CVPR, 2017.
[10] Florian Schroff, Dmitry Kalenichenko, and James Philbin, [26] Alec Radford, Luke Metz, and Soumith Chintala, “Unsuper-
“Facenet: A unified embedding for face recognition and clus- vised representation learning with deep convolutional genera-
tering,” in IEEE CVPR, 2015, pp. 815–823. tive adversarial networks,” preprint arXiv:1511.06434, 2015.
[11] Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, and Ming- [27] Jae Hyun Lim and Jong Chul Ye, “Geometric gan,” arXiv
Hsuan Yang, “Mode seeking generative adversarial networks preprint arXiv:1705.02894, 2017.
for diverse image synthesis,” in IEEE/CVF CVPR, 2019. [28] Teófilo Emı́dio De Campos, Bodla Rakesh Babu, Manik
[12] Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Varma, et al., “Character recognition in natural images.,” VIS-
Han, “Differentiable augmentation for data-efficient gan train- APP (2), vol. 7, no. 2, 2009.
ing,” NeurIPS, vol. 33, pp. 7559–7570, 2020. [29] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and
[13] Praveen Tirupattur, Yogesh Singh Rawat, Concetto Spamp- Li Fei-Fei, “Imagenet: A large-scale hierarchical image
inato, and Mubarak Shah, “Thoughtviz: Visualizing human database,” in IEEE CVPR. Ieee, 2009, pp. 248–255.
thoughts using generative adversarial network,” in Proceed- [30] “EPOC+ - 14 Channel EEG — emotiv.com,” https://
ings of the 26th ACM International Conference on Multimedia, www.emotiv.com/epoc/, [Accessed 14-Oct-2022].
New York, NY, USA, 2018, MM ’18, p. 950–958, ACM.
[31] Sepp Hochreiter and Jürgen Schmidhuber, “Long short-term
[14] Xiao Zheng, Wanzhong Chen, Mingyang Li, Tao Zhang, Yang memory,” Neural computation, vol. 9, no. 8, 1997.
You, and Yun Jiang, “Decoding human brain activity with deep
[32] Xin Jin and Jiawei Han, K-Means Clustering, pp. 563–564,
learning,” Biomedical Signal Processing and Control, vol. 56,
Springer US, Boston, MA, 2010.
pp. 101730, 2020.
[33] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Che-
[15] Xiang Zhang, Xiaocong Chen, Manqing Dong, Huan Liu,
ung, Alec Radford, and Xi Chen, “Improved techniques for
Chang Ge, and Lina Yao, “Multi-task generative adversarial
training gans,” in 30th NeurIPS, Red Hook, NY, USA, 2016,
learning on geometrical shape reconstruction from eeg brain
NIPS’16, p. 2234–2242, Curran Associates Inc.
signals,” arXiv preprint arXiv:1907.13351, 2019.
Authorized licensed use limited to: Universitaet Linz. Downloaded on November 01,2023 at 14:24:56 UTC from IEEE Xplore. Restrictions apply.