0% found this document useful (0 votes)

28 views

Cross Task Cognitive Load Classification With Identity Mapping Based Distributed CNN and Attention Based RNN Using Gabor Decomposed Data Images

Uploaded by

gunda manasa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

Cross Task Cognitive Load Classification With Identity Mapping Based Distributed CNN and Attention Based RNN Using Gabor Decomposed Data Images

Uploaded by

gunda manasa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

IETE Journal of Research

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tijr20

Cross-Task Cognitive Load Classification with

Identity Mapping-Based Distributed CNN and
Attention-Based RNN Using Gabor Decomposed
Data Images

Trupti Taori, Shankar Gupta, Sandesh Bhagat, Suhas Gajre & Ramchandra
Manthalkar

To cite this article: Trupti Taori, Shankar Gupta, Sandesh Bhagat, Suhas Gajre & Ramchandra
Manthalkar (2022): Cross-Task Cognitive Load Classification with Identity Mapping-Based
Distributed CNN and Attention-Based RNN Using Gabor Decomposed Data Images, IETE Journal
of Research, DOI: 10.1080/03772063.2022.2098191

To link to this article: https://doi.org/10.1080/03772063.2022.2098191

Published online: 02 Aug 2022.

Submit your article to this journal

Article views: 86

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=tijr20
IETE JOURNAL OF RESEARCH
https://doi.org/10.1080/03772063.2022.2098191

Cross-Task Cognitive Load Classification with Identity Mapping-Based

Distributed CNN and Attention-Based RNN Using Gabor Decomposed Data
Images
Trupti Taori , Shankar Gupta, Sandesh Bhagat, Suhas Gajre and Ramchandra Manthalkar

Department of Electronics and Telecommunication, Shri Guru Gobind Singhji Institute of Engineering and Technology, Nanded, India

ABSTRACT KEYWORDS
The cognitive workload is a key to developing a logical and conscious thinking system. Maintain- Attention mechanism;
ing an optimum workload improves the performance of an individual. The individuals’ psycho-social Cognitive load; Convolutional
factors are responsible for creating significant variability in the performance of a task, which poses neural network (CNN); Cross-
a significant challenge in developing a consistent model for the classification of cross-task cog- task; Electroencephalogram;
Long short- term memory;
nitive workload using physiological signal, Electroencephalogram (EEG). The primary focus of the Residual block
proposed work is to develop a robust classification model CARNN, by employing the concatenated
deep structure of distributed branches of convolutional neural networks with residual blocks
through identity mappings, and recurrent neural network with an attention mechanism. EEG data
is divided into milliseconds duration overlap segments. The segmented EEG data is converted into
images using Gabor decomposition with two spatial frequency scales and four orientations and sup-
plied as input to CARNN. The images are formed by interlacing the respective left and right electrode
data to capture the data variations effectively. Efficient feature aggregation with learning of spatial
and temporal domain discriminative features through Gabor decomposed data images improve the
training of CARNN. CARNN achieves outstanding performance over traditional classifiers; support
vector machine, k-nearest neighbor (KNN), ensemble subspace KNN and the pre-trained networks;
AlexNet, ResNet18/50, VGG16/19, and Inception-v3. The proposed method results in 94.2%, 92.5%,
95.9%, 92.8%, 94.3% classification accuracy, specificity, sensitivity, precision, and F1-score, respec-
tively. Two visual task levels apart in their complexity are used for cross-task classification of cognitive
workload. The proposed method is validated on raw EEG data of 44 participants.

1. INTRODUCTION visual stimuli (N-back, mental arithmetic, reading, and

Cognition is the process of achieving results through per- task in a simulated environment, etc.) to induce mul-
ception, knowledge, and reasoning [1]. Throughout the tiple levels of cognitive load, which results in changes
cerebral cortex, the interconnected constructs of neurons in activities of the brain [3–5]. A cross problem dis-
form the cognitive control network. Cognitive load refers cusses the assessment of EEG data with cross-subjects,
to the processing demands for any learning with avail- cross-sessions, cross-days, and cross-tasks [6,7]. Cross-
able brain resources. The cognitive workload is vital for task refers to training a model on one task and testing
substantive cognitive processing in the verbal and visual it for another task. Recently, automatic feature extrac-
channels. As a result, quantifying cognitive workload tion for EEG-related applications is gaining paramount
is critical for comprehending the limitations of human importance with the advent of deep learning techniques
learning. Subjective, behavioral, and objective measures [8–10]. Building a solid foundation for categorizing cog-
are used to collect data for cognitive workload analy- nitive workload for the cross-task is difficult due to
sis. Among all these measures, the objective measure the varied nature of the tasks and the subject variabil-
using electroencephalogram (EEG) signals is mostly pre- ity. The time–frequency transformation is most suitable
ferred by researchers [2]. The non-invasive physiological for the non-stationary signals since it preserves both the
signal measure, EEG, offers a very high temporal reso- frequency and time information of the signal. Gabor fil-
lution. EEG is widely used to detect electrical activity in ters have important characteristics of preserving the spa-
the brain using electrodes attached to the scalp surface tial and frequency localization in the signal. The Gabor
while the subject is engaged in task performance. Most filters are used in a variety of image analysis applica-
of the existing studies have utilized the application of tions with great success, including texture analysis, image

coding, handwritten number identification, face recog- Section 5 provides the implementation details. Section 6
nition, vehicle detection, and EEG signal processing presents the result to evaluate the performance of the pro-
[11–13]. The proposed work exploits this concept and posed method. Section 7 proposes the discussion. Finally,
hypothesized that the two-dimensional (2D) Gabor Section 8 draws the conclusion.
decomposed data images of EEG data can be used to
efficiently extract crucial features for cognitive load clas-
2. LITERATURE WORK
sification of cross-task utilizing deep network structures.
The cognitive workload estimation is crucial in health-
The contributions of the proposed method are: care, learning, and support. For example, it is very effec-
tive in diagnosing various brain disorders, designing
(i) The cascaded deep structure of distributed CNNs assistance systems for BCI applications, and, classify-
with residual blocks and RNN with attention mech- ing cognitive workload levels of operators in safety and
anism is proposed for efficient feature extraction surveillance systems, etc. [14–16]. The two approaches:
from the spatial and temporal domain. statistical and deep learning are widely used with physio-
(ii) Utilization of spatial frequency scales and orienta- logical signals (EEG), for cross-problem cognitive work-
tions of 2D Gabor filters for image formation from load estimation. There are very limited studies are avail-
raw EEG data and feed these images as input to dis- able on cross-task till date, maximum of the existing
tributed CNNs + attention based RNN (CARNN). literature work is referred here in this section.
Instead of manipulating convolutions using Gabor
filter in the initial network layers, the emphasis is
2.1 Statistical Approach
on giving Gabor processed input to deep CNN,
which aids in efficient feature extraction with ease With the invention of different machine learning tech-
from multiple spatial frequency scales and orien- niques, it becomes relatively simple to discern various
tations from Gabor decomposed data images and characteristics of a complex EEG signal [17,18]. The
enhances the training of the network. To the best existing studies follow the learning models with statis-
of our knowledge, this is the first implementation, tical methods (Logistic regression, Naive Bays, SVM),
where EEG data images after 2D Gabor filtering are which use hand-crafted features to classify cognitive load
used to classify cross-task cognitive workload using levels. In [19], the power of multiple frequency bands
deep network structures. from segmented EEG data is utilized for cross-task clas-
(iii) Enhancing the training of CNN with identity map- sification with the artificial neural network, using multi-
ping connections. RNN (LSTM) structure learns layer perceptron and result into lower classification accu-
temporal features by relating output features of dis- racy of 44.8%. In [17], event-related synchronization and
tributed CNNs in the temporal domain. Attention de-synchronization are utilized with the support vector
network is employed with RNN structure gives machine (SVM) classifier. The cross-task classification
more prominence to time steps covering important accuracy results in 44.8%. In another parallel work [18],
discriminating activity in EEG. the power spectral density (PSD) of segmented data from
(iv) A unified structure is trained end to end, where a six frequency bands is utilized as a feature. The per-
total number of residual blocks and type of identity formance of the SVM classifier and regression model is
mapping for CNN structure and total RNN layers near to chance level for cross task. Further, the recursive
are decided empirically. feature elimination-based feature ranking and feature-
subset selection are used, where the SVM regression
Two different visual tasks employing recognition and model outperforms the SVM classifier.
counting operations are carried out at multiple difficulty
levels for cross-task binary classification. The proposed In [3], the study is carried out where the MW estima-
study utilizes EEG data of small duration (1 s) from the tor is trained on a simple task and tested on a complex
onset of the stimulus, which is further divided into mul- task. A SVM regression model shows the improved per-
tiple overlap segments for cross-task analysis. formance (correlation coefficient = 0.740 ± 0.147) using
the feature subset of PSD from seven frequency bands
The remainder of this paper is structured as follows. selected with recursive feature elimination. In [4], models
Section 2 presents the related literature work. Section 3 of behavior are developed to identify common cross-
introduces the dataset for two visual tasks and the EEG participant and cross-task EEG features. The sequential
data acquisition process. Section 4 presents the method- forward floating selection is used to identify the indepen-
ology used throughout the experimentation process. dent components (ICs) of tasks. Further, linear regression
T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES 3

is used on power spectra of ICs and a continuous estimate 2.2 Deep Learning Approach
is generated for cross-subject cognitive load recognition.
Recently, deep learning methods have shown immense
The findings suggest that the complex behavioral dynam-
upgrading in the various biomedical field applications
ics are estimated with concurrent measures from EEG
[8,27]. The cross-problem estimation is also investigated
using a common finite feature set. A new avenue of map-
with the utilization of deep structures to reach an accept-
ping EEG source space related to various frequency bands
able performance level. In [28], CNN methods, support-
in the functional network is studied in [5]. The classifi-
ing single and double models are utilized along with novel
cation features subset is selected based on a sequential
fusion strategies of different networks. The CNN model is
feature selection algorithm, which selects the topmost
fed with the spectral maps (derived with the fast Fourier
common features for cross-task and is classified with 87%
transform) as 2D images, which hold the EEG signal’s
accuracy using SVM classifier. In [20], the work explored
spectral, spatial, and temporal information.
the domain adaptation method for cross-task cogni-
tive workload recognition. The working memory and
In [29], the cross-participant model is explored through
mental arithmetic task individually can be viewed as a
task-generic EEG features. A multi-layer perceptron neu-
domain. PSD and coherence features from brain connec-
ral network (MLPNN) is utilized to extract spectral fea-
tivity network are extracted from five different frequency
tures from five frequency bands. A temporal convolu-
bands. Among SVM and KNN, SVM produces higher
tional network (TCN) with an autoencoder is employed
binary classification accuracy (around 70%) for trans-
to analyze time-domain-based features from raw EEG
fer joint matching adaption. In another parallel work
data. MLPNN performs best with 64% accuracy. In [30],
[21], domain adaptation techniques using transfer com-
the EEG distributions of different subjects are alleviated
ponent analysis. The work achieves 30.0% classification
through the implementation of an online pre-alignment
accuracy for a 4-class cross-dataset problem using sparse
scheme (OPS), to balance the impact of cross-dataset
encoded representations of the decomposed wavelets. In
variability for the motor imagery datasets. OPS employs
[22], task-independent mental workload is discriminated
a recentering step prior to the training of a deep model
with cross-task of two working memory tasks. EEG spec-
with the usage of Riemannian mean covariance. OPS
tral features (PSD) and functional connectivity features
also improves generalization ability across datasets and
(phase lag index) are fused and classified with accu-
results in around 75% cross-subject classification accu-
racy of 91% using SVM and linear discriminant analysis
racy for various datasets. In [31], the cross-participant
(LDA) classifiers. In [23], the study compares the cross-
cognitive state in a non-stimulus-locked task environ-
task mental workload consistency with PSD of ongoing
ment is estimated. PSD values are normalized through
EEG and task-irrelevant auditory ERPs using the verbal
the creation of a cumulative distribution function at each
N-back and the multi-attribute task battery. The output of
time step for every electrode/frequency combination and
the discriminant analysis shows the statistical difference
is used to create input for CNN. It is concluded that
between both features used and area under curve is near
a novel convolutional-recurrent model using multi-path
to one. In [24], this work elaborates the studying real-
subnetworks and bi-directional, residual recurrent lay-
istic instructional materials for the optimum workload.
ers resulted in improved predictive accuracy (86.8%) and
Work uses SVM for cross-task classifications of different
decreased variance.
levels of WML utilizing spectral features from multiple
frequency bands, and achieves around 80% accuracy. In
In [32], a structure of deep recurrent and 3D-CNN
[25], work assesses cross-task mental workload during
is combined (R3DCNN) for automatic learning of fea-
anomaly detection in perceived visual stimuli of images
tures and performed cross-task binary classification. This
and video. Study utilizes multimodal signal-like electro-
existing work uses Morlet wavelet transformation to map
cardiogram, Electroocculogram, skin response, etc. LDA
frequency and time dimensions and uses EEG cubes
is used for feature reduction, where a subset of features
formed from topographic maps as input images for
with the largest Fisher separation value is considered.
3DCNN. The R3DCNN achieves an average accuracy of
SVM classifier results into 53.83% classification accuracy.
88.9%, which is a significant increase compared to the
state-of-the-art methods. In our previous work [33], tem-
It is argued that hand-crafted features lack in the extrac-
poral dynamics are modeled by grouping successive time
tion of crucial features from nonlinear EEG data [26],
segments of signal to form variable-length frames, and
therefore deep networks are employed as classifiers and
hand-crafted features from the statistical (band power),
for automatic feature extraction.
4 T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES

morphological (curve length), and nonlinear (approxi- induce cognitive workload is displayed on the monitor
mate entropy) domains for these frames from four fre- screen.
quency bands are used. Deep RNN employing LSTM
results in 92.8% binary cross-task classification accuracy.
3.2 Experimental Tasks
In recent years, different existing works explored mul- Two visual tasks utilizing basic trigonometric shapes and
tiple ways of improving the performance of the deep an abstract shape of basic red, green, and blue colors
model in the following ways: by feeding pre-processed for their identification and counting are executed in the
EEG signal as feature set [29] or pre-processed signal is proposed work to induce cognitive workload. The tasks
transformed into 2D images [32], with some initializa- now onwards will refer to as “shape” and “color” indi-
tion techniques [34] at the beginning layer of the model, vidually throughout this paper. Like in our previous pub-
and by improving the structure of the deep network [31]. lished work, the proposed work also considers level-1 and
The pre-trained, CNN-based deep models like AlexNet level-4 only (simple and complex levels, respectively) for
[35], VGG16, VGG19 [36], ResNet18, ResNet50 [37], cross-task analysis out of 4 designed levels [33,42,43]. The
and Inception-v3 [38] are also efficient through transfer visual stimulus is depicted in Figure 1.
learning in feature extraction [8,9]. In [39], the steer-
able filter structure is incorporated in the deep CNN The single-trial timeline is depicted in Figure 2, con-
and forms a new model as Gabor-CNN, where basic sisting of a 7 s visual stimulus, followed by a 2 s blank
convolutional operation is manipulated based on Gabor period and a maximum 5 s response time. In response
filters. The work demonstrates that Gabor-CNN is capa- time, a participant must provide a count of anyone asked
ble of learning features efficiently for object recognition trigonometric shape and anyone color balloon, which
and it has fewer learnable parameters, which improves they memorize. Every individual level consists of 10 tri-
its training. In [10], the 1D Gabor function is used to als with 30 s break between successive levels to diminish
obtain features from EEG data which are fed to the the effect of fatigue if experienced by participants. Per-
ridge regression classifier. Further, the obtained feature mutation of trigonometric shapes and colors among trials
set is fused with the visual feature (obtained with a deep in the respective task avoids adaptation. Each experi-
CNN) set and is used to train the combined visual-EEG mental task lasts for approximately 15 min. NASA Task
classifier, which boosts visual recognition. In the liter- Load Index score related to the workload level experi-
ature, Gabor filters are also used efficiently in the var- ence during task performance is filled by 30 subjects who
ious analysis of EEG like sleep disorders and seizures participate in a similar kind of dummy experimentation.
[40,41]. It validates the correctness of designed experimentation

3. MATERIAL AND METHODS

Forty-four (26 M + 18 F) common participants with an
average age of 20.7 years from an Engineering institute
have voluntarily participated and submitted written con-
sent for the execution of two designed tasks. All partic-
ipants were healthy, right-handed, and had normal or
corrected-to-normal vision. No one had a neurological
disorder history and was not on any medication. The
intended experiment received approval from the institu- Figure 1: Visual stimulus (A) Shape task, Simple level: used only
tional review committee. two trigonometric shapes, Complex level: used two inscribed
shapes. (B) Color task, Simple level: used only two colors and distri-
bution is simple, Complex level: used three colors and distribution
3.1 EEG Data Acquisition is random
The entire experimentation tasks are executed in a
quiet chamber maintained at 26°C. Dry electrodes are
mounted on the surface of the scalp using the 10–20
international system with the right ear as a reference.
EEG signals are recorded at 500 Hz using 32 channels
ENOBIO (Neuroelectric, Spain) and a high pass filter
with a cut-off frequency of 0.1 Hz. Visual stimulus to Figure 2: Time duration of a single trial
T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES 5

Figure 3: (A) The proposed 2D image generation method, segmented EEG data is convolved with 8 Gabor ﬁlters and the resultant of
every single spatial frequency scale with four orientations are cascaded, further these two cascading is clubbed vertically to form the 2D
image. (B) The proposed concatenated structure of distributed CNNs and RNN: CARNN

tasks. More details on the experimentation task can be the research community. The Gabor filters have sparked
found in [33,42]. renewed interest in a variety of computer vision applica-
tions. In the classification tasks, utilization of filters based
on the target data characteristics improves the perfor-
4. METHODOLOGY
mance of the deep model and requires less training data
Various processes carried out during experimentation for [34].
cross-task classification are discussed in this section. The
complete process flow is depicted in Figure 3. Thus, the images formed with Gabor filtered data is an
initialization, which helps CNN to extract more discrim-
4.1 Data Normalization inative features. Gabor filters offer optimal localization in
the spatial as well as frequency domain. Also, they are
Before data normalization, the electrode sequence is rear- found to be less vulnerable to noise, rotation, scaling, and
ranged in the captured data, where electrode pairs from a small range of translation [46]. In the spatial domain,
the left and right hemispheres are arranged side by side. the complex form of a 2D Gabor filter is expressed as,
The new arranged sequence is FP1, FP2, AF3, AF4, F3,
F4, F7, F8, FZ, FC1, FC2, FC5, FC6, T7, T8, C3, C4, CZ, g(x, y, λ, θ , ψ, σ , γ )
CP1, CP2, CP5, CP6, P3, P4, P7, P8, PZ, PO3, PO4, O1, 2
x + γ 2 y2 x
O2, OZ. The significance of this step is presented in the = exp − exp i 2π + ψ (1)
discussion section. Due to the variability of data among 2σ 2 λ
multiple levels of an individual, data must be normalized.
The z-score normalization is used in the proposed work. Where,
Normalization of data will contribute to effective learning
by deep networks [44]. x = x cos θ + y sin θ and y = −x sin θ + y cos θ (2)

4.2 Time Segmentation In Equation (1), λ scales the frequency of the sinusoidal
Time segmentation is a technique for extracting data and modulation, θ is the orientation represents the angle of
identifying components of small-time segments from a the major axis, ψ is the phase offset, σ scales the falloff
long EEG time series [45]. The proposed work employs of the Gaussian envelope, and γ specifies the ratio of x
a windowing technique where normalized time series of to y, called as spatial aspect ratio. This complex function
1 s (from the onset of stimulus period) is divided into 200 can be handled easily by breaking it down into its real and
ms segments with an overlap period of 100 ms. imaginary parts, referred to as even and odd functions.

fR (x, y, λ, θ , ψ, σ , γ )
4.3 Input Image Formation for CARNN
x + γ 2 y2 x
2

The resemblance of 2-D Gabor filters and receptive fields = exp − cos i 2π + ψ (3)
2σ 2 λ
of neurons in the V1 area of visual cortex, motivated
6 T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES

fI (x, y, λ, θ, ψ, σ , γ )

xs + γ 2 y2 x
2

= exp − sin i 2π + ψ (4)

2σ 2 λ

In the proposed work, a set of Gabor filters is selected

to find disparate information with different scales and
orientations. The proposed work uses two spatial fre-
quency scales (s1 = 2, s2 = 4) and four different orienta- Figure 4: The proposed CNN structure, curved arrows represent
identity connection
tions (θ1 = 00 , θ2 = 450 , θ3 = 900 , θ4 = 1350 ) of Gabor
filter as depicted in left bottom portion of Figure 3. The xl+1 = f (yl ) (6)
ratio of x to y is set to one, ψ is set to zero, σ is set to
four, and the total eight filters are used. Images are formed where, xl and xl+1 are input and output from the lth
by considering real and imaginary parts of the complex residual unit block. F represents the residual function.
function individually, given by Equations (3) and (4), f is the function after element-wise addition. The skip
respectively. The normalized time segmented EEG data connection facilitates the identity mapping which gives
are convolved with these Gabor filters, and the result of h(xl ) = xl . If f is also an identity mapping then Equation
these convolution forms a 2D image as follows: The four (6) can be,
orientations of s1 are concatenated horizontally. Similar
concatenation is also carried for s2 . Further, these two xl+1 = yl (7)
concatenations are clubbed vertically, which forms a 2D
image as depicted in Figure 3. These images are used Thus, the recursive function of the stacked residual struc-
as input for the deep network. The proposed work only ture results in,
utilizes images formed with the real part of the com-
L−1
plex function as it results in better performance of the xL = xl + F(xi , Wi ) (8)
proposed deep model. i=1

Where L is any deeper residual unit. Thus, the loss func-

4.4 Proposed CARNN Structure tion ε in backpropagation is represented as,

The CARNN is a cascaded structure of distributed CNNs δε δε δxL δε δ L−1
and RNN, as depicted in Figure 3. = = 1+ F(xi , Wi ) (9)
δxl δxL δxl δxL δxl i=1

4.4.1 Proposed CNN with Residual Identity Mapping This identity mapping provides a direct path for sig-
Connections nal propagation from anyone residual unit to any other
In the proposed work, nine (1 s data divided into nine residual unit in the network during forward and back-
overlap segments) distributed CNN structures are used ward paths. The pre-activation residual connection, Res
as depicted in Figure 3. The individual distributed CNN Block is depicted in Figure 5 (C) where, batch normaliza-
branch structure is depicted in Figure 4, where Block tion (BN) and rectified linear unit (ReLU) are connected
1–Block 5 are connected. The CNN structure is made of before the convolutional layer [47]. The identity skip
multiple layers like convolutional, which is generally fol- connection with 1X1 convolutional layer is depicted in
lowed by Normalization, activation function as depicted Figure 5 (D) is mainly used to extract the information
in Figure 5 (A), and finally, a fully connected layer is stored in the spatial dimension of Gabor decomposed
used. CNN learns spatial and spectral features from non- data images. The BN layer and ReLU are used as the
stationary EEG data. Distributed CNNs can apply the activation function to accelerate the convergence, which
same transformation for a list of input data. CNN struc- improves network performance. The Flatten layer at the
ture is created with stacking of convolution blocks and end of the CNN structure is used to convert the output
residual blocks. Residual block helps in optimal gradi- of the last block (Block 5) to a feature vector. The input
ent flow with the presence of identity skip connections by and output feature sizes from each block are mentioned in
preventing loss of information flow through various net- Figure 6. The last parameter of output represents the total
work layers. The normal residual block is shown in Figure filters used for the convolutional layers from that block.
5 (B) is functioning as follows: Despite the ability of CNN to capture crucial spatial fea-
tures from data through convolution operation, it cannot
yl = h(xl ) + F(xl , Wl ) (5) map a complex time relationship in the logical sequences.
T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES 7

Figure 5: (A) Structure of Convolutional block “Conv Block” (B) R1: Normal residual connection (C) R2: “Res Block” Pre-activation
residual connection. (D) R3: with 1 × 1 conv in identity skip connection

4.4.2 Proposed RNN with Attention Mechanism

In the proposed model, RNN employs LSTM, which
is capable of learning long-term dependencies in the
sequential data with its strong memory ability and over-
comes the vanishing gradient challenge that existed in the
traditional recurrent network. A single LSTM cell func-
tions with four components; forget gate, input gate, out-
put gate, and cell state. Basic LSTM structure is depicted
in Figure 7, Sigmoid neural net layer and point-wise mul-
tiplication operation are used to construct gates of LSTM.
The current state is affected by the forget gate, which
decides whether to retain the amount of information in
the current cell state from the previous cell state. This is
done with the sigmoid layer, which decides value between
0 and 1 for each value in cell state ct−1 based on ht−1 and
xt , and is given as,

ft = σ (Wf .[ht−1 , xt ] + bf ) (10)

The input gate sigmoid layer decides the values to be

updated and tanh layer creates a vector of new candi-
t which could be added to the state. These
date valuesC
operations are represented as,

it = σ (Wi .[ht−1 , xt ] + bi ) (11)

t = tanh(WC .[ht−1 , xt ] + bC )
C (12)

Figure 6: The proposed layered deep structure with input and t are then combined as given, to decide the new
it and C
output of each Block. The size of the kernel for all the convolu- input for the cell state.
tion layers is used as 3 × 3 for Conv Block and Res Block except
1 × 1convolution in shortcut path of Figure 5 (D). The number of t
ct = ft ∗ ct−1 + it ∗ C (13)
channels in output represents the total ﬁlters used for respective
convolution layers Cell state serve as the memory of LSTM. After finalizing
cell state the output gate is used to place selected amount
8 T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES

Where, β t is the temporal weights, an output computed

by attention module.
βt = fR (Wt .[uj ] + bt ) (17)
where, fR represents ReLU function, Wt refers to the
weight matrix, bt is bias vector, and uj (j = 1–9) repre-
sents feature vector output from CNN.

Attention mechanism provides more weights to the time

instances carrying more discriminative features and thus
Figure 7: Basic LSTM structure improves the performance of LSTM.

The fully connected layer at the end performs binary

classification.

5. EXPERIMENTATION
This section discusses the implementation details and
provides various metrics used for the performance eval-
Figure 8: Temporal attention mechanism with LSTM uation of the proposed method.

of information in hidden vector as, 5.1 Training and Implementation Details

ot = σ (Wo .[ht−1 , xt ] + bo ) (14) Gabor decomposed data image formation of EEG data
Further, the obtained cell state is passed through the tanh is carried out using Matlab (R2021a) under Windows10
function, which assign values between −1 and 1, and environment. The training and testing of CARNN is
combined it with output gate to produce ht , and is given implemented in python language using Keras with Ten-
as, sorflow as a deep learning framework [48] under Ubuntu
16.04 environment with Intel (R) Xeon (R) CPU, 2.54
ht = ot ∗ tanh(ct ) (15) GHz Processor, 16 GB running memory (RAM), and
NVIDIA GPU Tesla P100. To optimize the classifica-
here W and b denotes weight matrices and bias vectors,
tion output of the cross-task softmax activation and
respectively. The structure operates using five parameters
cross-entropy loss functions are utilized. The end-to-end
(ft , it , ct , ot , ht ) in an iterative manner.
CARNN model is trained for 100 epochs using Adam
optimizer with a batch size of 8 for both datasets with
The LSTM distinguishes information between current
the same hyper-parameters. For model training, an expo-
and successive time instant and works in the forward
nential decay function to an optimizer step is applied to
direction only. In the proposed model, two stacked lay-
decay the learning rate (initial learning rate = 0.0001,
ers of LSTM are followed by a dropout of 0.5 and uses 8
decay steps = 100,000, decay rate = 0.96) across differ-
and 4 hidden units, respectively.
ent invocations of optimizer functions as the training
progresses. The training and testing data images come
Due to temporal variations in EEG data, selective infor-
from different subjects. A 5-fold cross-validation process
mation utilization at significant time is the key idea used
is performed for traditional classifiers as well as for all
for attention mechanism. Attention mechanism focuses
deep networks to evaluate the model’s predictive perfor-
on important key features and ignores irrelevant features
mance, where 100 % of data from one task is used for
from data, by applying different weights to the input data
training and other task data is split into validation (20%)
based on information content. The attention mechanism
and test (80%) set. The average of 5-folds is considered for
is used with the RNN after LSTM layers as depicted in
testing. The input image size is set to 224 × 224 × 1 for
Figure 8.
the proposed model. For the other pre-trained models,
images are resized using bi-cubic interpolation to meet
The hidden output is processed with attention module.
their input requirements. The padding of data is kept the
The output from attention module is given as,
same, and stride is mentioned in Figure 6 for various
ht = βt ⊗ ht (16) convolutional layers of the proposed model.
T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES 9

5.2 Evaluation Metrics published work using the same dataset for cross-task clas-
sification [33]. The accuracy of giving the correct answer
To build an effective learning model, the evaluation step
is more for the simple level, and participants take less
is very important. The performance evaluation metrics
time to respond, while for the complex level, the accuracy
used for cross-task classification model are as follows:
is low, and participants take more time to respond. This
fact is also supported by the NASA TLX score provided
Accuracy: Represents a total number of correct predic-
by participants, that they really feel complex level tough
tions by model.
than simple level. Further, it is justified with the paired t-
Sensitivity: Represents the proportion of actual “one
test that the easy level and difficult level of both the tasks
class” data that is correctly identified.
are discriminable (p < 0.05 is significant), and both the
Specificity: Represents the proportion of actual “another
tasks are of a similar kind (p > 0.05 is not significant).
class” data which is correctly identified.
Precision: Represents the proportion of correctly identi-
fied one class data. 6.1 Classification of Within-Task Cognitive Load
F1-Score: Represents the harmonic mean of precision In the proposed method, EEG data of 1 s is only consid-
and sensitivity. ered from the onset of visual stimulus for all the analyses.
TP + TN 1 s duration is sufficient for the perception and identi-
Accuracy = (19) fication of objects through memory activity; thus, load
TP + TN + FP + FN
TP levels can be identified. The within-task performance
Sensitivity (Recall) = (20) evaluation for the shape and color tasks is carried out
TP + FN
independently for various durations of time segments,
TN
Specificity = (21) 50, 100, 200, and 250 ms, as mentioned in Table 1 with
TN + FP 50% overlap duration. The number of distributed CNNs
TP are equal to the total segments of the prescribed duration
Precision(Positive Predictive Value) =
TP + FP accommodated in 1 s data. For within-task classification,
(22) 80% of task data is used for training and validation. The
PPV ∗ Sensitivity remaining 20% data from the same task is used for test-
F1 − Score = 2∗ (23)
PPV + Sensitivity ing. This evaluation is carried out for images formed
with the only real component of the Gabor function.
Where TP, True positive; TN, True negative; FP, False The images formed with the imaginary component of the
positive; and FN, False negative are confusion matrix Gabor function are not considered anymore for any of the
parameters. analyses. The highest within-task classification accuracy
is 96.9% and 94.9% for shape and color tasks, respectively,
for a segment duration of 200 ms (with 100 ms overlap).
6. RESULTS
Thus 200 ms segment duration is utilized to evaluate the
In the proposed work, the raw EEG data are normal- cross-task performance. With 200 ms segment duration,
ized and segmented with a window size of 200 ms with there is a total of 3960 (44 subjects x 10 questions in each
an overlap period of 100 ms. These segmented data are level x nine overlap segments in 1 s data) images for easy
decomposed using the Gabor filters and converted into level and 3960 images for difficult level for both the tasks
images considering only the real part of the complex independently.
function discussed under subsection image formation.
Then CARNN is employed to automatically extract fea-
6.2 Classification of Cross-Task Cognitive Load
tures from spatial and temporal domain and to classify
binary load levels of cross-task. The behavioral response Cross-task classification is performed in the following
for subjective performance can be referred to from our two steps;

Table 1: Within-task classification accuracy for various segment duration (with 50% overlap of mentioned duration) for images
formed with the only real component of Gabor function. Acc., Accuracy; Sen., Sensitivity; Spe., Specificity (Bold face values
indicates higest value of evaluation metrics for 200 ms segment duration)
50 ms 100 ms 200 ms 250 ms

Acc. Spe. Sen. Acc. Spe. Sen. Acc. Spe. Sen. Acc. Spe. Sen.
Shape 93.9 92.7 95.1 94.6 93.3 95.9 96.9 96.5 97.3 94.8 94.0 95.6
Color 92.1 94.5 89.7 92.8 94.7 90.9 94.9 92.8 97.0 93.2 95.0 91.4
10 T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES

Figure 9: Model convergence for all ﬁve validation folds for Color and Shape training. The model converges around ten epochs, with
100% accuracy

6.3 Proposed Model Comparison with Traditional

Classifiers and State-of-the-Art Models
The baseline comparison with the proposed model con-
siders traditional classifiers like SVM, k-nearest neigh-
bour (KNN) and ensemble subspace KNN (ESKNN).
The cubic kernel is utilized for SVM, and k = 3 is uti-
lized for KNN as they result in better performance.
Figure 10: Visualization of output from CNN network blocks of Gabor feature vectors of segmented EEG data are used as
single subject from test data. Filter positions are selected same for
input for traditional classifiers. The comparison of cross-
easy level and diﬃcult level data of shape task
task classification accuracy with traditional classifiers is
depicted in Figure 11. Here it is clear that the proposed
model outperforms over all three traditional classifiers.
The proposed model is also compared with state-of-the-
Step 1: 100% shape data are used for training, 20% color art CNN-based pre-trained deep models as mentioned
data are used for validation, and the remaining 80% color in Table 2, on shape and color data for the cross-task.
data are used for testing. Alexnet, VGG16, and VGG19 consist of 5,13,16 convolu-
tional layers, respectively, and have three fully connected
Step 2: 100% color data are used for training, 20% shape layers at the end. ResNet18 and ResNet50 consist of 17, 49
data are used for validation, and the remaining 80% shape convolutional layers, respectively, and a single fully con-
data are used for testing. nected layer at the end. Whereas LSTM layers and atten-
tion mechanisms like in the proposed work. Inception-v3
For cross-task, the proposed model is converged around has a total of 48 layers with a single fully connected layer.
ten epochs for all folds as depicted in Figure 9 during These models use pre-trained weights of their training
training of shape as well as for color task. from the image-net dataset. All these models are also
connected as distributed structures along with proposed
CRNN model does not use an attention mechanism with attention based RNN. The AlexNet results in low perfor-
LSTM and results in a less average accuracy of cross-task mance as a smaller number of convolution layers without
as 91.8%. The visualization of extracted features from the identity connections results in loss and fail to extract
output of all five blocks of CARNN is depicted in Figure the discriminative features very efficiently. VGG16 and
10 for the easy and difficult level of shape task for the VGG19 are almost showing the same performance here
single subject from test data. less results might be attributed to the absence of residual
T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES 11

of train and test combinations of all the metrics for

CARNN;

Accuracy = 94.2%, Specificity = 92.5%, Precision =

92.8%, Recall = 95.9%, and F1-score = 94.3% are con-
sidered as result of cross-task classification using the
proposed method.

6.4 Comparison with Cross-Task Existing Studies

Figure 11: Cross-task classiﬁcation accuracy comparison with Cross-task challenge for cognitive workload classifi-
traditional classiﬁers cation is explored by different existing studies. This
part presents the comparison of the proposed model
with existing studies from quantitative and qualitative
Table 2: Parameters and input image sizes for pre-trained standpoints as studies differ in their experimental task
networks and the proposed network
and implementation details. Some of the quantifiable
Model Approximate Parameters Input Image Size
attributes are discussed for cross-task research in Table 4.
Alex Net 62 M 227 × 227 × 3
VGG16 138 M 224 × 224 × 3
In [19], ANNs are trained for short time duration EEG
VGG19 143 M 224 × 224 × 3 data where participants engaged in task performance
ResNet18 11.5 M 224 × 224 × 3 with no prior exposure to that task. Here the observed
ResNet50 26 M 224 × 224 × 3
Inception-v3 23.8 M 299 × 299 × 3 results could not even meet the general probability limit.
Proposed CARNN 2.9 M 224 × 224 × 1 In [17], as well, the obtained results of cross-task are
not significant over chance level for working memory
tasks using SVM classifier. The recursive feature elimi-
block and overfitting of the network. The performance of nation algorithm is employed in [18], to identify a sub-
a very deep network on EEG data is not very promising; set of features for cross-task. The classification accuracy
as the network goes deep, the performance is degraded using n-back tasks deteriorates. The cross-task prob-
[49]. ResNet50 shows degraded performance because lem is dealt with a functional connectivity network in
of more depth despite the presence of residual blocks. [5], which utilizes common discriminative features from
ResNet18 is showing good performance, but this degra- tasks. SVM classifier provides better classification accu-
dation in performance might be due to depth. This depth racy of 87%. Here from this work, it can deduce that
issue is also verified empirically in the proposed work, methods used for feature selection play an important role
which is discussed in the subsequent section. Inception- along with good classifier usage, which is difficult for
v3 is a wider model with multiple filters of different sizes generalization of the model. The deep learning archi-
on the same level. Even though the performance of this tectures, with their great success in object recognition,
model is good as compared to previously mentioned pre- have recently been exploited for mental workload clas-
trained networks. But Inception-v3 is computationally sification. In [29], MLPNN along with TCN autoencoder
very expensive and requires around 23.8 M parameters, extracts frequency and time domain-based EEG features.
whereas the proposed model requires 2.9 M parameters. Work concludes that a vigilance decrement is classified
The Computed values of all the metrics are mentioned using EEG for an unseen individual and unseen task.
in Table 3 for all the models. The average of both steps In [32], the robust EEG features are learned from the

Table 3: Comparison of proposed model with other deep structures for cross-task classification with 200 ms segment duration
(100 ms overlap). Acc., Accuracy; Sen., Sensitivity; Spe., Specificity; Pre., Precision; F1Sco., F1-Score
Model Train: Shape, Test: Color Train: Color, Test: Shape Average

Model Acc. Sen. Spe. Pre. F1Sco. Acc. Sen. Spe. Pre. F1Sco. Acc. Sen. Spe. Pre. F1Sco.
AlexNet 67.5 60.0 75.0 70.6 64.9 65.0 70.0 60.0 63.6 66.7 66.3 65.0 67.5 67.1 65.8
VGG19 72.0 77.0 67.0 70.0 73.3 74.0 72.0 76.0 75.0 73.5 73.0 74.5 71.5 72.5 73.4
VGG16 75.0 78.0 72.0 73.6 75.7 71.0 66.3 75.7 73.3 67.5 73.0 72.2 73.9 73.3 71.6
ResNet18 92.1 93.1 91.1 91.2 92.1 89.0 85.0 93.0 92.4 88.5 90.6 89.1 92.1 91.8 90.3
ResNet50 85.2 88.4 81.8 83.0 84.4 81.0 76.0 86.0 84.4 80.0 83.1 82.2 83.9 83.7 82.2
Inception-v3 92.1 93.2 91.0 91.2 92.1 88.0 90.0 86.0 86.5 88.2 90.1 91.6 88.5 88.9 90.2
CRNN 92.0 92.0 92.0 92.0 92.0 91.6 91.1 92.1 91.9 91.5 91.8 91.6 92.5 92.0 91.8
CARNN 94.6 93.0 93.2 96.2 94.7 93.8 92.0 92.3 95.6 93.9 94.2 92.5 92.8 95.9 94.3
12 T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES

spatial and spectral domain using a topographic map

92.8
86.0

89.0
92.2
Spe.
–
–
–

–
and event-related spectral perturbation with 3DCNN fol-
lowed by temporal features learning through RNN. The

92.5
Sen.

88.0

89.8
93.5
–
–
–

–
work reported cross-task classification accuracy of 88.9%,
which is higher than with the usage of traditional meth-

94.2
44.8
53.5

87.0
29.3

84.0

92.8
88.9
Acc.
ods. In our previous work [33], temporal modeling is
carried out on the same dataset as pre-processing stage

Shape and Color recognition with their

before extracting hand-crafted features. These features

Air Traﬃc Controller, Psychomotor

Visual identiﬁcation and counting

are fed to the RNN structure and produce a classification

Vigilance, 3-Stimulus Oddball

N-back and Mental arithmetic

Spatial n-back and arithmetic

accuracy of 92.8%.

Reading, n-back, Stern-barg

N-back, Reading, Aritmetic
Verbal and spatial n-back
Task
From the above studies, it is clear that deep learning helps
to improve the performance of cross-task. Thus, in the
proposed work, Gabor decomposed data images are like

counting
pre-processed data, which are beneficial for automatic

Table 4: Comparison of the proposed work with existing studies for cross-task analysis. Par., Participant; CV, Cross-validation
spatial feature extraction from a distributed branch of
CNN networks. Thus, instead of modulating the filters

50% training, 50% val.

of the convolutional layer, directly Gabor decomposed

Leave-two-out CV
data images are used, as some of the CNN layer kernels

Leave-one out CV
Impl. Details
are similar to Gabor filters. RNN is learning temporal

Not speciﬁed
10-fold CV
dependencies in the time series data, and the attention

5-fold CV
3-fold CV

5-fold CV
mechanism enhances the weights of time instances hold-
ing crucial information. The result obtained for cross-
task classification is promising with obtained accuracy of

Signal Duration
94.2%. The high performance of the proposed method

around 10 s

around 12 s
around 6 s
could be characterized by Gabor filtered data images

3.5 min.
500 ms
having spatial frequency information with various orien-

12 s

1s
1s
tations and the capability of CARNN to learn the spatial
and temporal features for the classification of cognitive
Partici-pants

workload efficiently. 32
15

44
21
17
28

44
7. DISCUSSION Temporal modelling BLSTM

The proposed work explored the effect of the application

of Gabor decomposed data images for the classification
MLPNN, TCN, TCNA

3D-CNN + BLSTM
Method

of cross-task cognitive load using CARNN.

Deep CARNN

The electrode positions are rearranged, where left and

ANN
SVM
SVM
SVM

right hemisphere electrodes forming a pair are kept side

by side as hemispheres reflect the different patterns of
work. This arrangement helps in capturing spatial differ-
Statistical, morphological and

ences in both sides with a small mask size (σ = 4) of the

Gabor decomposed data
EEG topographic maps
PSD,time do-main, FC

Gabor filter, which ultimately reduces the total compu-

Average band power
Features

tations required for convolution during Gabor filtering.

The farther electrodes need a large size mask, which
Nonlinear
Band power

images

increments the computations.

ERS/ERD
PSD

As pre-processing, the raw EEG data is only normalized,

and no noise removal is carried out before applying the
Baldwin et al. [19]

Kamrud et al. [29]

Georgios et al. [5]

Proposed model
Walter et al. [17]

Zhang et al. [32]

Gupta et al. [33]

Gabor decomposition to the segmented data. It shows the

Ke et al. [18]

capability of the Gabor filter to handle noise effectively

Author

in highly nonlinear data. Again, with Gabor decomposed

data images, even less data is sufficient for the effective
T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES 13

Table 5: Cross-task classification accuracy for various time

duration of EEG data with segment size of 200 ms (100 ms
overlap)
1s 3s 6s 9s 11 s
Train: Shape Test: Color 94.6 94.2 94.3 93.9 94.0
Train: Color 93.8 93.4 93.7 93.8 93.2
Test: Shape
Average 94.2 93.8 94.0 93.8 93.6

training of the network. This, in turn, proves the robust-

ness of the proposed method and its capability to work in
real-time.

7.1 Effect on Large Duration EEG Data

Practical operations in the real world induce multilevel
cognitive load of varied duration for different activities.
Here the proposed model is also evaluated on a larger Figure 12: Training and validation loss for normal, 1 × 1 Identity
duration of EEG data, 3, 6, 9, and 11 s, to analyze the effect mapping, and pre-activation Res Block connection
on cross-task accuracy. Here, the network structure is the
same, only with more segments in a long-time duration,
the number of distributed CNNs is increasing (with more
convolution filters per layer). Table 5 shows the result total CNN layers finalized in Model 1. By increasing the
of this analysis. The accuracy with a larger duration of width (Block 6 is added to the structure present in Figure
data is not showing a significant difference. Based on an 4), the performance is degraded and results in 81.9% test
individual’s proficiency, the varied activity is generated accuracy for cross-task. By reducing the width (Block 5
in EEG, and thus slight variation in performance with is removed from the structure present in Figure 4) the
increasing time can be seen. degraded performance is seen with 86.3% accuracy for
cross-task classification. Thus, the presence of residual
blocks at early stages in the network is used to prevent
7.2 Importance of Residual Block the problem of vanishing gradient by providing an iden-
tity mapping to activations. Again, with the BN layer,
In shallow networks, identity loops help to solve the
the regularization is provided at different layers, and
problem of spurious local optima [47]. During the exper-
gradients will have similar magnitudes. It leads to well-
imentation part, it is observed that the absence of identity
conditioning, and faster convergence, which ultimately
skip connections through the residual block in the net-
results in the improvement of network performance.
work (Model 2 in Table 6) results in 82.6% cross-task
classification accuracy. The three different residual con-
nections are tested empirically during experimentation
7.3 Ablation Study
as depicted in Figure 5. Residual connection minimizes
information flow loss through various network layers The ablation study is carried out to identify the effec-
and preserves features. The normal residual connection tiveness of the layered structure of the proposed training
produces a large error. The pre-activated residual con- pipeline, and Gabor decomposed data images. The exper-
nection (The proposed Res Block) and 1 × 1 convolution imentation is done with CARNN, the cascaded structure
as shortcut connection which replaces identity is show- of distributed CNN and RNN model. For distributed
ing improvement in the network training and resulted CNN, the network is tested from a minimum of three
in classification accuracy of 94.2% and 91.8%, respec- layers to a maximum of seven layers, whereas RNN is
tively, and are represented as Model 4 and Model 5 in tested from one to four with various permutations of
Table 6, for the cross-task. Model 4 provides more gen- CNN and RNN layers. The five CNN layers with two
eralization, whereas Model 5 results in a higher loss in RNN layers are finalized as it results in maximum accu-
training than Model 4. The training and validation losses racy of 79.2%. This is referred to as Model 1 in Table 6.
for these three cases are depicted in Figure 12. The pro- The usage of Bidirectional LSTM layers instead of LSTM
posed structure is also evaluated for different widths of layers in the RNN network does not show a significant
residual blocks in CNN structure just to cross-check the difference in their performance, with BLSTM accuracy
14 T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES

Table 6: Development stages of the proposed training pipeline, Acc. is the

average accuracy for cross-task classification. Checkmark indicates the pres-
ence of a respective component in the training pipeline.
CNN LSTM Atten. Mecha. R1 R2 R3 Acc.
√ √
Model 1 √ √ √ 79.2
Model 2 √ √ √ √ 82.6
Model 3 √ √ √ √ 89.1
Model 4 √ √ √ √ 94.2
Model 5 91.8

being 79.8%, with around double the number of param- is sufficient to identify discriminative features for cogni-
eters for the network. Further, the attention mechanism tive load classification. The proposed method generalizes
is added to Model 1, referred to as Model 2, and it results the cross-task classification framework with the usage
in improved accuracy. Residual skip identity connection of EEG data without any noise removal, Gabor featured
proves its importance in various vision research stud- images, and deep CARNN structure. The classification
ies recently. So, three different types of residual blocks, accuracy of 94.2% is obtained for cross-task, where the
namely R1, R2, and R3 as depicted in Figure 5 (B), (C), model is trained on one task that can efficiently tackle
(D), respectively, are tested for the EEG dataset. Now another task for classification. The extensive study of the
CNN structure is extended by incorporating R1, R2, R3 Gabor filter with multiple scales and more orientations
individually. The classification accuracy result for these can be carried out in the future to gain better insight
three types of residual blocks are mentioned in Table 6 into the multilevel cognitive workload estimation and
referred to as Model 3, Model 4, Model 5. optimization of electrodes for real-time processing.

The images formed with the real component of the ACKNOWLEDGEMENT

Gabor function show better performance of the proposed
The work was supported by funds from the All India Council
method for cross-task classification (Accuracy = 94.2%)
for Technical Education (AICTE), New Delhi, under Research
than the imaginary component of the Gabor function, Promotion Scheme grant number 8-71/FDC/RPS(POLICY-
which results in (Accuracy = 89.4%). Thus, the imag- 1)/2019–20.
inary part is not considered for analysis. For the pro-
posed method, an image is created by Gabor decomposed DISCLOSURE STATEMENT
data utilizing two scales and four orientations. When
classification accuracy is tested for Gabor decomposed No potential conflict of interest was reported by the author(s).
data images created with only s1 (with all four orienta-
tions), the obtained result is not up to the mark (Accu- FUNDING
racy = 63.3%). Similarly, less result is also obtained for This work was supported by All India Council for Tech-
images formed with only s2 (all four orientations) (Accu- nical Education: [Grant Number 8-71/FDC/RPS(POLICY-
racy = 69.9%). Therefore, a single spatial frequency scale 1)/2019–20].
of the Gabor filter is insufficient to capture all discrimina-
tive features from data, which indicates the need for more ORCID
spatial frequency scales. Trupti Taori http://orcid.org/0000-0002-5221-6845

REFERENCES
8. CONCLUSION 1. J. Sweller, “Cognitive load theory,” Psychol. Learn. Motiv.,
The proposed method explores the cognitive workload Vol. 55, pp. 37–76, 2011. Elsevier. DOI:10.1016/B978-0-
12-387691-1.00002-8
classification for cross-task using the CARNN model.
Distributed CNNs are used to learn spatial features of 2. P. Antonenko, F. Paas, R. Grabner, and T. Van Gog, “Using
data, which is benefited with the usage of Gabor filtered electroencephalography to measure cognitive load,” Educ.
data images in the beginning layers of the network. The Psychol. Rev., Vol. 22, no. 4, pp. 425–438, 2010. DOI:10.
automatic learning of features from the temporal domain 1007/s10648-010-9130-y
by RNN is proved to be very effective when supported
3. Y. Ke, et al., “An EEG-based mental workload estimator
by an attention mechanism, which contributes to the trained on working memory task can work well under sim-
improvement of the performance of the model. A small ulated multi-attribute task,” Front. Hum. Neurosci., Vol. 8,
duration (1 s) of EEG data from the onset of stimulus no. 703, pp. 1–10, 2014. DOI:10.3389/fnhum.2014.00703
T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES 15

4. J. Touryan, B. J. Lance, S. E. Kerick, A. J. Ries, and K. passive BCI using machine learning methods,” Expert. Syst.
McDowell, “Common EEG features for behavioral estima- Appl., Vol. 134, pp. 153–166, 2019. DOI:10.1016/j.eswa.
tion in disparate, real-world tasks,” Biol. Psychol., Vol. 114, 2019.05.057
pp. 93–107, 2016. DOI:10.1016/j.biopsycho.2015.12.009
16. S. Simpraga, R. Alvarez-Jimenez, H. D. Mansvelder, J.
5. G. N. Dimitrakopoulos, I. Kakkos, Z. Dai, J. Lim, J. J. M. Van Gerven, G. J. Groeneveld, S.-S. Poil, and K.
de Souza, A. Bezerianos, and Y. Sun, “Task- indepen- Linkenkaer-Hansen, “EEG machine learning for accurate
dent mental workload classification based upon common detection of cholinergic intervention and Alzheimer’s dis-
multiband EEG cortical connectivity,” IEEE Trans. Neural ease,” Sci. Rep., Vol. 7, no. 1, pp. 1–11, 2017. DOI:10.1038/
Syst. Rehabil. Eng., Vol. 25, no. 11, pp. 1940–1949, 2017. s41598-017-06165-4
DOI:10.1109/TNSRE.2017.2701002
17. C. Walter, S. Schmidt, W. Rosenstiel, P. Gerjets, and
6. R. G. Hefron, and B. J. Borghetti, “A new feature for cross- M. Bogdan, “Using cross-task classification for classify-
day psychophysiological workload estimation,” in 2016 ing workload levels in complex learning tasks,” in 2013
15th IEEE International Conference on Machine Learning Humaine Association Conference on Affective Computing
and Applications (ICMLA), 2016, pp. 785–790. IEEE. and Intelligent Interaction, 2013, pp. 876–881. IEEE.

7. J. Zhang, Y. Wang, and S. Li, “Cross-subject mental work- 18. Y. Ke, et al., “Towards an effective cross-task mental
load classification using kernel spectral regression and workload recognition model using electroencephalogra-
transfer learning techniques,” Cogn. Technol. Work, Vol. 19, phy based on feature selection and support vector machine
no. 4, pp. 587–605, 2017. DOI:10.1007/s10111-017-0425-3 regression,” Int. J. Psychophysiol., Vol. 98, no. 2, pp.
157–166, 2015. DOI:10.1016/j.ijpsycho.2015.10.004
8. R. Sharma, R. B. Pachori, and P. Sircar, “Automated emo-
tion recognition based on higher order statistics and deep 19. C. L. Baldwin, and B. N. Penaranda, “Adaptive training
learning algorithm,” Biomed. Signal. Process. Control., Vol. using an artificial neural network and EEG metrics for
58, pp. 101867, 2020. DOI:10.1016/j.bspc.2020.101867 within-and cross-task workload classification,” NeuroIm-
age, Vol. 59, no. 1, pp. 48–56, 2012. DOI:10.1016/j.neuro
9. A. Singhal, R. Shukla, P. K. Kankar, S. Dubey, S. Singh, and image.2011.07.047
R. B. Pachori, “Comparing the capabilities of transfer learn-
ing models to detect skin lesion in humans,” Proceedings of 20. Y. Zhou, et al., “Cross-Task cognitive workload recog-
the Institution of Mechanical Engineers, Part H: Journal of nition based on EEG and domain adaptation,” IEEE
Engineering in Medicine, Vol. 234, no. 10, pp. 1083–1093, Trans. Neural Syst. Rehabil. Eng., Vol. 30, pp. 50–60, 2022.
2020. DOI:10.1177/0954411920939829 DOI:10.1109/TNSRE.2022.3140456

10. N. Cudlenco, N. Popescu, and M. Leordeanu, “Reading 21. W. L. Lim, O. Sourina, and L. Wang, “Cross dataset work-
into the mind’s eye: boosting automatic visual recognition load classification using encoded wavelet decomposition
with EEG signals,” Neurocomputing, Vol. 386, pp. 281–292, features,” in 2018 International Conference on Cyber-
2020. DOI:10.1016/j.neucom.2019.12.076 worlds (CW), 2018. IEEE.

11. M. Abadi, M. Khoudeir, and S. Marchand, “Gabor filter- 22. I. Kakkos, et al., “EEG fingerprints of task-independent
based texture features to archaeological ceramic materials mental workload discrimination,” IEEE. J. Biomed. Health.
characteri- zation,” in International Conference on Image Inform., Vol. 25, no. 10, pp. 3824–3833, 2021. DOI:10.1109/
and Signal Processing, 2012, pp. 333–342. Springer. JBHI.2021.3085131

12. Y. Hamamoto, S. Uchimura, M. Watanabe, T. Yasuda, Y. 23. Y. Ke, et al., “Cross-task consistency of EEG-based mental
Mitani, and S. Tomita, “A gabor filter-based method for workload indicators: comparisons between power spec-
recognizing handwritten numerals,” Pattern Recognit., Vol. tral density and task-irrelevant auditory event-related
31, no. 4, pp. 395–400, 1998. DOI:10.1016/S0031-3203(97) potentials,” Front. Neurosci., Vol. 15, pp. 1–14, 2021.
00057-5 DOI:10.3389/fnins.2021.703139

13. J. Arrospide, and L. Salgado, “Log-gabor filters for image- 24. P. Gerjets, C. Walter, W. Rosenstiel, M. Bogdan, and
based vehicle verification,” IEEE Trans. Image Process., Vol. T.O. Zander, “Cognitive state monitoring and the design
22, no. 6, pp. 2286–2295, 2013. DOI:10.1109/TIP.2013. of adaptive instruction in digital environments: lessons
2249080 learned from cognitive workload assessment using a pas-
sive brain-computer interface approach,” Front. Neurosci.,
14. M. Jimnez-Guarneros, and P. Gomez-Gil, “Custom domain Vol. 8, pp. 385, 2014. DOI:10.3389/fnins.2014.00385
adaptation: a new method for cross-subject, EEG-based
cognitive load recognition,” IEEE Signal Process Lett., Vol. 25. G. Zhao, Y.-J. Liu, and Y. Shi, “Real-time assessment of
27, pp. 750–754, 2020. DOI:10.1109/LSP.2020.2989663 the cross-task mental workload using physiological mea-
sures during anomaly detection,” IEEE Transactions on
15. Ç. İ. Acı, M. Kaya, and Y. Mishchenko, “Distinguish- Human-Machine Systems, Vol. 48, no. 2, pp. 149–160, 2018.
ing mental attention states of humans via an EEG-based DOI:10.1109/THMS.2018.2803025
16 T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES

26. P. Zhang, X. Wang, J. Chen, W. You, and W. Zhang, “Spec- Conference on Computer Vision and Pattern Recognition,
tral and temporal feature learning with two-stream neural 2016, pp. 770–778.
networks for mental workload assessment,” IEEE Trans.
Neural Syst. Rehabil. Eng., Vol. 27, no. 6, pp. 1149–1159, 38. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z.
2019. DOI:10.1109/TNSRE.2019.2913400 Wojna, “Rethinking the inception architecture for com-
puter vision,” in Proceedings of the IEEE Conference
27. S. Madhavan, R. K. Tripathy, and R. B. Pachori, “Time- on Computer Vision and Pattern Recognition, 2016, pp.
frequency domain deep convolutional neural network for 2818–2826.
the classification of focal and non-focal EEG signals,”
IEEE Sensors J., Vol. 20, no. 6, pp. 3078–3086, 2020. 39. S. Luan, C. Chen, B. Zhang, J. Han, and J. Liu, “Gabor
DOI:10.1109/JSEN.2019.2956072 convolutional networks,” IEEE Trans. Image Process., Vol.
27, no. 9, pp. 4357–4366, 2018. DOI:10.1109/TIP.2018.
28. Z. Jiao, X. Gao, Y. Wang, J. Li, and H. Xu, “Deep convolu- 2835143
tional neural networks for mental load classification based
on EEG data,” Pattern Recognit., Vol. 76, pp. 582–595, 2018. 40. T. Sunil Kumar, and V. Kanhangad, “Gabor filter-based
DOI:10.1016/j.patcog.2017.12.002 one-dimensional local phase descriptors for obstructive
sleep apnea detection using single-lead ECG,” IEEE Sensors
29. A. Kamrud, B. Borghetti, C. S. Kabban, and M. Miller, Letters, Vol. 2, no. 1, pp. 1–4, 2018.
“Generalized deep learning eeg models for cross- partici-
pant and cross-task detection of the vigilance decrement 41. K. Samiee, P. Kovács, and M. Gabbouj, “Epileptic seizure
in sustained attention tasks,” Sensors, Vol. 21, no. 16, pp. detection in long-term EEG records using sparse ratio-
1–24, 2021, article no. 5617. DOI:10.3390/s21165617 nal decomposition and local gabor binary patterns feature
extraction,” Knowl. Based. Syst., Vol. 118, pp. 228–240,
30. L. Xu, M. Xu, Y. Ke, X. An, S. Liu, and D. Ming, “Cross- 2017. DOI:10.1016/j.knosys.2016.11.023
dataset variability problem in EEG decoding with deep
learning,” Front. Hum. Neurosci., Vol. 14, pp. 103, 2020. 42. S. S. Gupta, and R. R. Manthalkar, “Classification of visual
DOI:10.3389/fnhum.2020.00103 cognitive workload using analytic wavelet transform,”
Biomed. Signal. Process. Control., Vol. 61, pp. 101961, 2020.
31. R. Hefron, B. Borghetti, C. Schubert Kabban, J. Chris- DOI:10.1016/j.bspc.2020.101961
tensen James, and J. Estepp, “Cross-participant EEG-based
assessment of cognitive workload using multi-path convo- 43. M. Y. Ladekar, S. S. Gupta, Y. V. Joshi, and R. R. Manthalkar,
lutional recurrent neural networks,” Sensors, Vol. 18, no. 5, “EEG based visual cognitive workload analysis using mul-
pp. 1339, 2018. DOI:10.3390/s18051339 tirate iir filters,” Biomed. Signal. Process. Control., Vol. 68,
pp. 102819, 2021. DOI:10.1016/j.bspc.2021.102819
32. P. Zhang, X. Wang, W. Zhang, and J. Chen, “Learning
spatial–spectral–temporal EEG features with recurrent 3d 44. J. Sola, and J. Sevilla, “Importance of input data normal-
convolutional neural networks for cross-task mental work- ization for the application of neural networks to complex
load assessment,” IEEE Trans. Neural Syst. Rehabil. Eng., industrial problems,” IEEE Trans. Nucl. Sci., Vol. 44, no. 3,
Vol. 27, no. 1, pp. 31–42, 2019. DOI:10.1109/TNSRE.2018. pp. 1464–1468, 1997. DOI:10.1109/23.589532
2884641
45. Q. Wang, and O. Sourina, “Real-time mental arithmetic
33. S. S. Gupta, T. J. Taori, M. Y. Ladekar, R. R. Manthalkar, S. S. task recognition from EEG signals,” IEEE Trans. Neu-
Gajre, and Y. V. Joshi, “Classification of cross task cognitive ral Syst. Rehabil. Eng., Vol. 21, no. 2, pp. 225–232, 2013.
workload using deep recurrent network with modelling of DOI:10.1109/TNSRE.2012.2236576
temporal dynamics,” Biomed. Signal. Process. Control., Vol.
70, pp. 103070, 2021. DOI:10.1016/j.bspc.2021.103070 46. S. Molaei, and M. E. Shiri Ahmad Abadi, “Maintaining
filter structure: A gabor-based convolutional neural net-
34. A. Alekseev, and A. Bobe, “Gabornet, “gabor filters with work for image analysis,” Appl. Soft. Comput., Vol. 88, pp.
learnable parameters in deep convolutional neural net- 105960, 2020. DOI:10.1016/j.asoc.2019.105960
work,”,” in 2019 International Conference on Engineering
and Telecommunication (EnT), 2019, pp. 1–4. IEEE. 47. M. Hardt, and T. Ma, “Identity matters in deep learning”
arXiv preprint arXiv:1611.04231 < /, 2016.
35. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
classification with deep convolutional neural networks,” 48. J. V. Dillon, et al. “Tensorflow distributions” arXiv preprint
Adv. Neural. Inf. Process. Syst., Vol. 25, pp. 1097–1105, 2012. arXiv:1711.10604 < /, 2017.

36. K. Simonyan, and A. Zisserman. “Very deep convolutional 49. A. Craik, Y. He, and J. L. Contreras-Vidal, “Deep learn-
networks for large-scale image recognition,” arXiv preprint ing for electroencephalogram (EEG) classification tasks: a
arXiv:1409.1556, 2014. review,” J. Neural Eng., Vol. 16, no. 3, pp. 031001, 2019.
DOI:10.1088/1741-2552/ab0ab5
37. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learn-
ing for image recognition,” in Proceedings of the IEEE
T. TAORI ET AL.: CROSS-TASK COGNITIVE LOAD CLASSIFICATION USING GABOR DECOMPOSED DATA IMAGES 17

AUTHORS Suhas S Gajre received the PhD degree

from Indian Institute of Technology
Trupti J Taori received the master’s (I.I.T), Delhi, India, in 2007. Presently he
degree in VLSI in 2010, She is a is working as professor in the Department
research scholar at SGGSIET, Nanded of Electronics and Telecommunication
since 2019, where she is working Engineering at SGGSIET, Nanded, India.
on biomedical signal processing. Her His research interests include biomedical
research interest includes embedded signal and image processing, and analog
systems, VLSI and biomedical engineer- and mixed signal VLSI design.
ing.
Email: [email protected]
Corresponding author. Email: [email protected]
Ramchandra R Manthalkar received the
Shankar S Gupta received the master’s PhD degree from Indian Institute of
degree in electronics in 2015, He is a Technology (I.I.T), Kharagpur, India, in
research scholar at SGGSIET, Nanded 2003. Presently he is working as pro-
since 2017, where he is working on fessor in the Department of Electron-
biomedical signal processing. His research ics and Telecommunication Engineering
interest includes biomedical engineering, at SGGSIET, Nanded, India. His research
and natural language processing. interests include digital signal and image
processing, VLSI, and computer networks.
Email: [email protected]
Email: [email protected]
Sandesh Bhagat received the master’s
degree in electronics in 2018, He is a
research scholar at SGGSIET, Nanded
since 2018, where he is working on com-
puter vision based agricultural applica-
tions. His research interest includes deep
learning, and computer vision.

Email: [email protected]