0% found this document useful (0 votes)

32 views12 pages

Transformer

Uploaded by

varathanps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views12 pages

Transformer

Uploaded by

varathanps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Computers in Biology and Medicine 151 (2022) 106320

Contents lists available at ScienceDirect

Computers in Biology and Medicine

journal homepage: www.elsevier.com/locate/compbiomed

Classifying ASD based on time-series fMRI using spatial–temporal

transformer✩
Xin Deng a , Jiahao Zhang a , Rui Liu b ,∗, Ke Liu a
a
The Key Laboratory of Data Engineering and Visual Computing, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
b
Department of Computer Science, City University of Hong Kong, 999077, Hong Kong, China

ARTICLE INFO ABSTRACT

Keywords: As the prevalence of autism spectrum disorder (ASD) increases globally, more and more patients need to
Autism spectrum disorder (ASD) receive timely diagnosis and treatment to alleviate their suffering. However, the current diagnosis method of
Functional magnetic resonance imaging (fMRI) ASD still adopts the subjective symptom-based criteria through clinical observation, which is time-consuming
Deep learning(DL)
and costly. In recent years, functional magnetic resonance imaging (fMRI) neuroimaging techniques have
Transformer
emerged to facilitate the identification of potential biomarkers for diagnosing ASD. In this study, we developed
Adversarial Generation Network(GAN)
ABIDE
a deep learning framework named spatial–temporal Transformer (ST-Transformer) to distinguish ASD subjects
from typical controls based on fMRI data. Specifically, a linear spatial–temporal multi-headed attention unit is
proposed to obtain the spatial and temporal representation of fMRI data. Moreover, a Gaussian GAN-based data
balancing method is introduced to solve the data unbalance problem in real-world ASD datasets for subtype
ASD diagnosis. Our proposed ST-Transformer is evaluated on a large cohort of subjects from two independent
datasets (ABIDE I and ABIDE II) and achieves robust accuracies of 71.0% and 70.6%, respectively. Compared
with state-of-the-art methods, our results demonstrate competitive performance in ASD diagnosis.

1. Introduction impairment of the brain. i.e.,electroencephalogram, near-infrared spec-

troscopy, and functional magnetic resonance imaging (fMRI). fMRI, as
The autism spectrum disorder (ASD) is a widespread mental illness an emerging non-invasive neuroimaging technology, provides a better
characterized by brain development disorder, especially among adoles- spatial resolution and is widely used to detect structural and functional
cents. Individuals with the ASD typically experience difficulties with changes in the brain [4]. The principle of fMRI is to use magnetic
emotional, verbal, or non-verbal expression in social interactions, as resonance imaging to measure the hemodynamic changes induced by
well as marked interest in restrictive behaviors and repetitive move- neuronal activity. In fMRI data, the volume of the brain is represented
ments. In developed countries, about 1.5% of children are diagnosed by a set of small cubic elements called voxels. Tracking the activity
with ASD [1]. Recent increases in ASD prevalence have brought a of each voxel over time can extract a time series from it. Growing
series strain on society and the families of ASD sufferers. However, the evidence suggests that fMRI signals have shown great potential for ASD
symptom-based [2] method requires the physicians with considerable identification [5]. For example, Li et al. [6] used fMRI data to effec-
training and solid expertise to make an accurate diagnosis. During the tively model brain connectivity in ASD subjects through a graph neural
assessment process personal observations and subjective decisions tend
network approach, which can facilitate the understanding of the neural
to misdiagnose or overdiagnose mild cases, as demonstrated by recent
activity of ASD. Similarly, an individual brain network was constructed
study in [3]. There is an urgent need to implement effective computer-
to obtain feature representation, and then the features were fed to the
aided diagnosis (CAD) technology to assist physicians in auxiliary
deep neural network classifier to perform ASD classification [7]. How-
diagnosis.
ever, most existing CAD methods usually require elaborated feature
With the development of neuroimaging technology, a large num-
extraction, which is a time-consuming and labor-intensive process.
ber of neuroimaging technologies have emerged to detect functional

✩ This work was supported in part by the Natural Science Foundation of Chongqing, China under Grant cstc2020jcyj-msxmX0284; in part by The Science
and Technology Research Program of Chongqing Municipal Education Commission, China under Grant KJQN202000625; in part by the National Natural Science
Foundation of China under Grant 61806033, Grant 61703065; and in part by the Educational Reform Project of CQUPT, China under Grant XJG20207.
∗ Corresponding author.
E-mail address: [email protected] (R. Liu).

https://doi.org/10.1016/j.compbiomed.2022.106320
Received 3 March 2022; Received in revised form 11 October 2022; Accepted 14 November 2022
Available online 17 November 2022
0010-4825/© 2022 Elsevier Ltd. All rights reserved.
X. Deng et al. Computers in Biology and Medicine 151 (2022) 106320

Attention mechanism [8], a well-known deep learning technique, Functional connectivity (FC) is the most commonly used fMRI feature
has been widely adopted in natural language processing (NLP) [9– for CAD models, which is the temporal correlation between blood oxy-
12], computer vision (CV) [13–16], and speech processing [17,18]. The gen level-dependent signals from separate brain regions. FC can reflect
attention-based approach that simplifies the complex feature extraction the functional interactions among different brain regions. Thanks to the
process by focusing on important parts during neural network training strong ability of FC to characterize connection patterns of brain activity,
has also been applied to develop advanced CAD methods for brain numerous conventional learning-based studies are constructed based on
disease diagnosis [19–21]. Since most ASD datasets are collected from FC measures. In particular, many conventional learning CAD methods
different clinical sites with different sampling period, there is a complex based on FC measures involve a two-step manner (feature selection and
multi-site data problem. To address the multi-site data problem of classifier construction).
ASD, the researchers used multi-head attention to compute independent Typically, feature extraction is performed to find potential biomark-
features in parallel, and then concatenated these independent features. ers for ASD identification. Then, a well-constructed classifier is adapted
This parallel structure can obtain information of fMRI data from differ- to perform the ASD classification task. For example, Wang et al. [24]
ent perspectives, and solve the problem of multi-site data to a certain chose 35 ROIs to construct the FC matrix, finding the optimal features
extent [22]. However, the above methods only addressed the multi-site by support vector machine (SVM) with Gaussian kernel was used to
data problem of ASD in a static way, where the temporal character- identify fMRI scans of ASD. Sadeghian et al. [25] constructed a whole-
istic of fMRI data is not considered [23]. In addition, the diagnosis
brain FC with feature dimension reduction by a genetic algorithm, to
of ASD subtypes is also crucial for planning the treatment plan for
achieve the final ASD diagnosis using a k-nearest neighbor classifier.
ASD patients. The significant data imbalance in ASD subtypes raises
Wang et al. [26] used a similarity-driven multi-view linear recon-
several issues, including poor diagnostic performance and imbalanced
struction model to learn potential representations and perform topic
specificity and sensitivity in current subtype diagnostic studies.
clustering in ASD and healthy controls. Then, a nested singular value
To address the above issues, an end-to-end deep learning framework
decomposition method was designed to extract FC features. Finally, the
named spatial–temporal Transformer (ST-Transformer) is proposed to
extracted FC features are feed forward to a linear SVM classifier for ASD
effectively distinguish ASD with subtypes based on time-series fMRI
detection. Yap et al. [27] proposed a framework based on penalized
data. To be specific, a linear spatial–temporal multi-headed attention
(LSTMA) unit is proposed to simultaneously learn joint feature repre- SVM clusters that combine the selection of significant FCs from the
sentation on the spatial–temporal domain. In addition, for ASD subtype original FCs as input features for the final SVM classifier. To explore
diagnosis, an approach named Gaussian-GAN data balancing (GGDB) the FC of the brain, Ma et al. [28] used the Hilbert transform to deter-
is proposed to address the data unbalance issue in real-world ASD mine the phase synchrony among brain regions. Principal component
datasets. Comprehensive experiments are conducted for performance analysis and SVM were utilized to develop a discriminant model for
evaluation based on two ASD datasets (e.g., Autism Brain Imaging identifying ASD.
Data Exchange (ABIDE) I/II). The contributions of this paper can be Although, conventional learning-based methods achieve good clas-
summarized as follows: sification results on homogeneous data sites. Well-designed feature
(1) A linear spatial–temporal multi-headed attention (LSTMA) unit selection (dimensionality reduction) methods are still the core of their
is introduced into this work. Specifically, a linear multi-headed atten- good performance. Moreover, these data-dependent feature selection
tion method is applied to ROI-averaged time series from both spatial methods are time-consuming, labor-intensive, and also lack generality.
and temporal perspectives. With the help of the attention mechanism,
the LSTMA unit is able to hierarchically complement fMRI feature spa- 2.2. Deep learning-based methods
tial and temporal domain to learn fine-grained feature representation
for facilitating diagnosis performance. In addition, by using the LSTMA Deep learning has been successfully applied in brain disease diag-
unit, the training process of neural networks can be accelerated. nosis [29–32]. In FC-based deep learning, many researchers directly
(2) To address the data imbalance problem of ASD subtypes sam- use deep learning models to capture potential biomarkers, rather than
ples, we propose a GGDB strategy. Our GGDB is capable to learn the carefully designed feature extraction algorithms. For example, Leming
hidden representation and distribution of the original data to generate et al. [33] used the FC matrix extracted from a multi-source fMRI
pseudo data samples. connected group dataset to train a convolutional neural network (CNN)
(3) Based on the LSTMA unit and GGDB, the end-to-end for ASD diagnosis. Heinsfeld et al. [34] designed a stacked denoising
ST-Transformer is constructed to diagnose ASD with subtypes. The autoencoder to find the hidden patterns of FC for further diagnosis of
effectiveness and reliability of the proposed methods are validated on ASD. Eslami et al. [35] also combined autoencoder with single-layer
two real-world datasets. Compared with state-of-the-art methods, our perceptron to perform FC selection and classification in an end-to-end
results show competitive performance in ASD diagnosis. manner. However, FC-based methods for ASD diagnosis do not consider
The rest of this paper is organized as follows. In Section 2, we briefly the time-series nature of fMRI data, which loses the temporal variation
review previous studies on fMRI-based CAD methods for ASD diagnosis. information.
Section 3 describes the studied datasets and data preprocessing. In In recent years, recurrent neural network (RNN), long short-term
Section 4, we introduce the detail of the proposed ST-Transformer memory (LSTM), and attention mechanism-based methods have shown
framework and data balancing strategy for ASD subtypes. The experi-
great potential in the diagnosis of brain diseases, which can capture fea-
mental results and analysis are discussed in Section 5. Finally, the paper
ture information in the temporal domain. For example, Liu et al. [36]
is concluded in Section 6.
developed a novel multi-network of LSTM for the identification of
2. Related works attention deficit hyperactivity disorder (ADHD). Dvornek et al. [37]
proposed a recurrent neural network with LSTM for the classification
In this section, we briefly review previous work on fMRI-based of individuals with ASD and typical controls directly from the fMRI
CAD methods for ASD diagnosis. The existing CAD methods are mainly time series. A framework with an RNN unit was presented by Byeon
divided into two categories: conventional learning-based methods and et al. [38] to extract temporal properties of fMRI data for multi-
deep learning-based methods. site ASD classification. The RNN, LSTM, CNN, and multiple hybrid
models were proposed together for the diagnosis of ASD by Bayram
2.1. Conventional learning-based methods et al. [39]. It was shown that the RNN exhibited better performance
than the other methods. Niu et al. [23] proposed a multichannel deep
Conventional learning-based CAD methods typically apply machine attention neural network, which integrates multilayer neural networks,
learning methods to perform diagnosis tasks based on fMRI features. attention mechanisms, and feature fusion for ASD recognition. Zhang

2
X. Deng et al. Computers in Biology and Medicine 151 (2022) 106320

Table 1
Demographic information on ABIDE I.
Site ASD TC
Age avg(SD) Sex(M/F) Handedness(L/M/R) FIQ avg(SD) Age avg(SD) Sex(M/F) Handedness(L/M/R) FIQ avg(SD)
CALTECH 27.4(10.3) 15/4 0/5/14 108.2(12.2) 28.0(10.9) 14/4 1/3/14 114.8(9.3)
CMU 26.4(5.8) 11/3 1/1/12 114.5(11.2) 26.8(5.7) 10/3 0/1/12 114.6(9.3)
KKI 10.0(1.4) 16/4 1/3/16 97.9(17.1) 10.0(1.2) 20/8 1/3/24 112.1(9.2)
LEUVEN 17.8(5.0) 26/3 3/0/26 109.4(12.6) 18.2(5.1) 29/5 4/0/30 114.8(12.4)
MAX_MUN 26.1(14.9) 21/3 2/0/22 109.9(14.2) 24.6(8.8) 27/1 0/0/28 111.8(9.1)
NYU 14.7(7.1) 65/10 N/A 107.1(16.3) 15.7(6.2) 74/26 N/A 113.0(13.3)
OLIN 16.5(3.4) 16/3 4/0/15 112.6(17.8) 16.7(3.6) 13/2 2/0/13 113.9(16.0)
PITT 19.0(7.3) 25/4 3/1/25 110.2(14.3) 18.9(6.6) 23/4 1/1/25 110.1(9.2)
SBL 35.0(10.4) 15/0 N/A N/A 33.7(6.6) 15/0 N/A N/A
SDSU 14.7(1.8) 13/1 1/0/13 111.4(17.4) 14.2(1.9) 16/6 3/0/19 108.1(10.3)
STANFORD 10.0(1.6) 15/4 3/1/15 110.7(15.7) 10.0(1.6) 16/4 0/2/18 112.1(15.0)
TRINITY 16.8(3.2) 22/0 0/0/22 108.9(15.2) 17.1(3.8) 25/0 0/0/25 112.5(9.2)
UCLA 13.0(2.5) 48/6 6/0/48 100.4(13.4) 13.0(1.9) 38/6 4/0/40 106.4(11.1)
UM 13.2(2.4) 57/9 7/8/51 105.5(17.1) 14.8(3.6) 56/18 9/2/63 108.2(9.7)
USM 23.5(8.3) 46/0 N/A 99.7(16.4) 21.3(8.4) 25/0 N/A 115.4(14.8)
YALE 12.7(3.0) 20/8 5/0/23 94.6(21.2) 12.7(2.8) 20/8 4/0/24 105.0(17.1)
1
FIQ: Full Scale Intelligence Quotient SD: Standard Deviation TC: Typical Control Avg: Average

et al. [20] proposed a new two-stage network structure for the classi- width at half maximum (FWHW). The details of preprocessing pipelines
fication of ADHD by combining a split-channel convolutional network are available on the websites (https://rfmri.org/DPARSF). In this work,
with an attention-based network. The split-channel convolutional net- we leveraged CC200 functional parcellation atlas to partition the whole
work was used to learn temporal features of each brain region, while brain into 200 ROIs for extracting the time series data.
the attention-based network was used to discover temporal correlation
features among brain regions and extract fusion features. 3.2. Data augmentation
Although temporal deep learning methods have made great progress,
they only focus on temporal information while ignoring the spatial Although ABIDE provides over a thousand subjects, the training
domain. Temporal domain-based deep learning models fail to ade- of neural network models tends to require numerous data to prevent
quately utilize the information from fMRI data. It causes the temporal overfitting phenomena. We adopt a simple cropping data augmentation
domain-based deep learning methods that do not provide excellent strategy the same as in [37]. Since the time length of the data obtained
generalization ability, resulting in limited classification performance. by each site is different, we fix a time length of 90 as the final
In this paper, we propose an ST-Transformer deep learning framework data length to ensure that the time length of our sliced data remains
to pay attention to the information in the spatio-temporal domain of consistent. We crop 10 sequences of the ROI-averaged time series for
fMRI data for the diagnosis of ASD. each subject by randomly sliding windows. The number of original
fMRI data has inflated to 10 times for further model training.
3. Materials
4. Model design
In this section, we will introduce the fMRI datasets, the image
pre-processing pipeline, and data augmentation strategy used in our In this study, the overall flowchart of the proposed model is shown
study. in Fig. 1. Transformer is applied as the backbone of our proposed model
owing to its specific multi-headed self-attention mechanism that can
3.1. Data preprocessing pay attention to the global information of the fMRI time series well.
Based on the Transformer, a LSTMA unit is designed in our proposed
In this study, preprocessed ABIDE I dataset is downloaded from [40]. ST-Transformer to capture the spatio-temporal properties of fMRI data.
The ABIDE I dataset is collected from 17 different sites with 1112 sub- In addition, we propose a GGDB method to address the data imbalance
jects, including 539 patients with ASD and 573 typical controls. A total problem of ASD subtypes samples.
of 1035 subjects with complete labeling information are available. To
fix the time length of the ROI-averaged time series, 1009 valid subjects 4.1. Preliminaries
are selected in this work. The detailed ABIDE I dataset information is
presented in Table 1. The preprocessed fMRI data selected in this work The overall structure of Transformer-encoder consists of multi-
employs the Configurable Pipeline for the Analysis of Connectomes headed self-attention module, position feed-forward network (FNN),
C-PAC pipeline with Craddock 200 (CC200) functional parcellation. residual connectivity, and layer normalization, as shown in Fig. 2a.
Besides, we also selected valid fMRI data from ABIDE II to validate Self-attention mechanism is the core of Transformer-encoder, which is
the generalization ability of our proposed model. ABIDE II involves computed from the Query, Key, and Value matrices. Given a packed
19 sites containing a combined sample of 1114 subjects, consisting of matrix representing queries 𝑄 ∈ R𝑁×𝐷𝑘 , keys 𝐾 ∈ R𝑀×𝐷𝑘 , and values
521 ASD subjects and 593 healthy controls. To fix the time length 𝑉 ∈ R𝑀×𝐷𝑣 , the scaled dot-product attention is formulated by Eq. (1),
of the ROI-averaged time series, 1058 valid subjects are selected in ( )
𝑄𝐾 𝑇
this work. The detailed ABIDE II dataset information is presented in Attention(𝑄, 𝐾, 𝑉 ) = sof tmax √ 𝑉 (1)
Table 2. However, there is no processed ABIDE II data available online. 𝐷𝑘
To preprocess the fMRI raw data, the pipeline of Data Processing where 𝑁 and 𝑀 denote the lengths of queries and keys (or values).
Assistant for Resting-State fMRI (DPARSF) [41] is used in this work. 𝐷𝑘 and 𝐷𝑣 denote the dimensions of keys (or queries) and values.
The preprocessing steps in the DPARSF pipeline include removal of Softmax is an activation function that converts the attention score
√ into
the first few volumes, slice timing correction and realignment, motion a probability. The dot product of 𝑄 and 𝐾 T matrices divided by 𝐷𝑘 is
correction, spatial normalization, bandpass filtering, normalization by to solve the gradient vanishing problem. In contrast to the simple single
the MNI template, and smoothing with a 6-mm Gaussian kernel of full attention function, Transformer applies a multi-headed self-attention

3
X. Deng et al. Computers in Biology and Medicine 151 (2022) 106320

Table 2
Demographic information on ABIDE II.
Site ASD TC
Age avg(SD) Sex(M/F) Handedness(L/M/R) FIQ avg(SD) Age avg(SD) Sex(M/F) Handedness(L/M/R) FIQ avg(SD)
BNI 37.4(15.8) 29/0 0/0/29 107.8(13.5) 39.6(14.8) 29/0 0/0/29 112.4(11.9)
EMC 8.0(1.2) 22/5 5/0/22 N/A 8.1(1.0) 22/5 6/0/21 N/A
ETH 20.6(3.3) 13/0 0/0/13 109(12.5) 23.9(4.4) 24/0 0/0/24 116.5(9.3)
GU 10.9(1.5) 43/8 8/0/43 118.3(15.2) 10.4(1.7) 28/27 3/0/52 121.5(13.7)
IU 25.0(9.1) 16/4 2/3/15 116.3(11.5) 23.8(4.8) 15/5 1/2/17 117(10.4)
KKI 10.3(1.5) 41/15 2/8/46 103.4(15.8) 10.3(1.2) 99/56 10/12/133 114.3(10.5)
KUL 23.6(4.8) 28/0 6/0/22 106.6(15.5) N/A N/A N/A N/A
NYU 8.9(4.8) 67/8 6/15/54 103.8(16.8) 9.5(3.3) 28/2 0/1/29 116.1(15.4)
OHSU 11.8(2.2) 30/7 1/1/35 106.0(16.5) 10.4(1.6) 27/29 0/1/55 117.5(11.9)
OILH 21.8(3.6) 20/4 4/4/16 114.0(15.9) 24.0(3.6) 20/15 0/3/32 111.2(12.7)
SDSU 12.9(3.2) 26/7 4/2/27 99.8(14.5) 13.3(3.0) 23/2 1/3/21 103.0(11.5)
SU 11.2(1.2) 19/2 0/0/21 111.8(15.4) 11.0(1.3) 19/2 0/3/18 116.1(13.7)
TCD 14.9(3.2) 21/0 0/0/21 108.5(15.0) 15.6(3.0) 21/0 0/0/21 118.5(12.9)
UCD 14.8(1.9) 14/4 0/1/17 103.4(11.8) 14.8(1.7) 10/4 0/0/14 113.0(10.8)
UCLA 11.7(2.2) 15/1 2/0/14 102.1(13.5) 9.7(2.1) 11/5 1/1/14 114.5(13.4)
U_MIA 9.9(1.9) 11/2 0/1/12 100.8(19.2) 9.7(2.1) 11/4 0/0/15 115.9(14.2)
USM 18.3(6.8) 15/2 0/2/15 99.3(19.1) 24.0(7.5) 13/3 0/1/15 115.2(14.1)

Fig. 1. The flowchart of the presented model.

mechanism. The rationale behind multi-headed self-attention is that the deep-level neural network models are difficult to train, two residual
queries, keys, and values matrices of the original 𝐷𝑚 dimension are connections with layer normalization are respectively applied to the
mapped to ℎ different 𝐷𝑘 , 𝐷𝑘 , 𝐷𝑣 by ℎ different projection mappings. output of the multi-headed self-attention and FNN.
For each projection mapping of queries, keys, values matrices, the
output can be calculated as shown in Eq. (1). Finally, the ℎ different 4.2. Spatial–temporal transformer
output is connected and projected back to the original 𝐷𝑚 dimension,
which is the output of multi-headed self-attention. The multi-headed Based on Transformer-encoder, a novel ST-Transformer is proposed
self-attention can be given by Eq. (2), to hierarchically complement fMRI features in both spatial and tempo-
( ) ral domains to learn fine-grained feature representation for facilitating
MultiHead (𝑄, 𝐾, 𝑉 ) = Concat head 1 , … , head h 𝑊 𝑂
( ) (2) diagnosis performance. The architecture of ST-Transformer is shown
where head i = Attention 𝑄𝑊𝑖𝑄 , 𝐾𝑊𝑖𝐾 , 𝑉 𝑊𝑖𝑉 in Fig. 2b. ST-Transformer is a variant of Transformer-encoder, which
designs a linear spatial–temporal multi-headed attention (LSTMA) unit
where the projections are parameter matrices 𝑊𝑖𝑄 ∈ R𝑁×𝐷𝑘 , 𝑊𝑖𝐾 ∈ to replace the self-attention mechanism. Since the self-attention mech-
R𝑁×𝐷𝑘 , 𝑊𝑖𝑉 ∈ R𝑀×𝐷𝑣 , 𝑊𝑖𝑂 ∈ Rℎ𝐷𝑣 ×𝑁 . 𝑊𝑖𝑄 , 𝑊𝑖𝐾 , and 𝑊𝑖𝑉 are the anism in vanilla Transformer is more likely to focus on the information
weight matrices corresponding to the 𝑄, 𝐾, and 𝑉 vectors, respectively. of the sequence data in the temporal domain. However, the spatial
𝑊𝑖𝑄 is the weight matrix of the attention scores concatenated by information of sequence data will always be ignored by the traditional
multiple heads. Then, the FNN applies two linear transformations with self-attention.
relu activation function to the output of multi-headed self-attention as To extract the spatial–temporal dependency of sequential data, a
the followings, spatial–temporal multi-headed self-attention (STMA) unit is proposed,
( ) as shown in Fig. 2b. The STMA Unit obtains the spatio-temporal feature
FFN(𝑥) = Relu 𝑥𝑊1 + 𝑏1 𝑊2 + 𝑏2 (3)
representation of fMRI data by first conducting spatial self-attention
where 𝑥 denotes the output of the previous layer, and 𝑊1 , 𝑊2 , 𝑏1 , 𝑏2 , and then temporal self-attention. Besides the data volume of the fMRI
denotes the trainable parameters. Finally, to solve the problem that time series is relatively large, the self-attention requires a large number

4
X. Deng et al. Computers in Biology and Medicine 151 (2022) 106320

Fig. 2. Architectures of the Transformer encoder and ST-Transformer. 𝑁 represents the number of layers. 𝑄, 𝐾, and 𝑉 are Query vector, Key vector, and Value vector respectively.
𝜙(⋅) is a linear transformation. (⋅) is the dot product. 𝑍 is the output matrix of 𝑄, 𝐾, 𝑉 calculated by 𝐿𝐴.

of computational resources and exclusive time to train the model on a Algorithm 1: Pseudo-code of the ST-Transformer
large dataset. Therefore, to speed up the model training process, we Input: Preprocessed data 𝑋𝑡𝑟𝑎𝑖𝑛 ,𝑋𝑡𝑒𝑠𝑡 , phenotypic data 𝑋𝑝 , and
proposed a linear attention module to replace the self-attention unit in label 𝑌𝑡𝑟𝑎𝑖𝑛 ,𝑌𝑡𝑒𝑠𝑡
Transformer. For the given matrix of queries 𝑄 ∈ R𝑁×𝐷𝑘 , keys 𝐾 ∈ 𝑝𝑟𝑒𝑑
Output: Predicted probabilities of test set 𝑌𝑡𝑒𝑠𝑡
R𝑀×𝐷𝑘 , and values 𝑉 ∈ R𝑀×𝐷𝑣 , 𝐾 performs a dot product operation
1 Initialize ST-Transformer;
with 𝑉 through a feature map. The final attention result is obtained by
// n: number of training epoch
dotting the result with a feature map passed by 𝑄. Linear attention can
2 for n=1,....,epochs do
be written as,
( ) // Q, K, V are obtained by linear
LA(𝑄, 𝐾, 𝑉 ) = 𝜙(𝑄) ⋅ 𝜙(𝐾)⊤ ⋅ 𝑉 (4) transformation of 𝑋𝑡𝑟𝑎𝑖𝑛
where 𝜙 is a feature map that is applied in a row-wise manner. In this
// 𝜙 is a feature map
study, we select a feature map, which is can be given by Eq. (6), 3 𝐿𝐴(𝑄, 𝐾, 𝑉 ) ← 𝜙(𝑄) ⋅ (𝜙(𝐾)⊤ ⋅ 𝑉 );
// ℎ𝑒𝑎𝑑 represents a pass through 𝐿𝐴
𝜙(𝑥) = 𝑒𝑙𝑢(𝑥) + 1 (5) 4 𝑀𝑢𝑙𝑡𝑖𝐻𝑒𝑎𝑑(𝑄, 𝐾, 𝑉 ) ← 𝐶𝑜𝑛𝑐𝑎𝑡(ℎ𝑒𝑎𝑑 1 , … , ℎ𝑒𝑎𝑑 h )𝑊 𝑂 ;
where 𝑒𝑙𝑢 [42] is an activation function. Replacing self-attention with
// spatial multi-headed linear attention
linear attention, comparing Eqs. (1) and (5), we can notice 5 𝑋𝑠 ← 𝑀𝑢𝑙𝑡𝑖𝐻𝑒𝑎𝑑(𝑄, 𝐾, 𝑉 );
( ) that the // temporal multi-headed linear attention
time complexity of computing attention decrease from 𝑂 𝑛2 to 𝑂 (𝑛), ′
which significantly reduced the model training time. 6 𝑋𝑡 ← 𝑀𝑢𝑙𝑡𝑖𝐻𝑒𝑎𝑑(𝑄, 𝐾, 𝑉 ) ;
Based on the STMA and linear attention module, we propose the // LN: Layer Normalizaiton RC: Residual
LSTMA unit to efficiently learn the spatio-temporal feature represen- Connection
tation of fMRI time-series data. The LSTMA unit consists of spatial 7 𝑋𝑙1 ← 𝐿𝑁(𝑅𝐶(𝑋𝑡𝑟𝑎𝑖𝑛 ,𝑋𝑡 ));
multi-headed linear attention and temporal multi-headed linear atten- // FFN: feed-forward network
tion. Given 𝑥 ∈ 𝑇 ×𝑁 as input, spatial multi-headed linear attention take 8 𝑋𝑓 ← 𝐹 𝐹 𝑁(𝑋𝑙1 ));
each column (1 × 𝑇 ) as one token. Therefore, the spatial multi-headed 9 𝑋𝑙2 ← 𝐿𝑁(𝑅𝐶(𝑋𝑙1 ,𝑋𝑓 ));
linear attention can learn dependency between tokens (brain regions). ′
10 𝑋 ← 𝐶𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑒(𝑋𝑙2 ,𝑋𝑝 );
After the spatial multi-headed linear attention, the dimensionality of ′
the adjusted 𝑥 is 𝑁 × 𝑇 , temporal multi-headed linear attention take 11 𝑜𝑢𝑡𝑝𝑢𝑡 ← 𝐹 𝑢𝑙𝑙𝑦𝐶𝑜𝑛𝑛𝑒𝑐𝑡𝑁𝑒𝑡𝑤𝑜𝑟𝑘𝑠(𝑋 );
each row (1 × 𝑁) as one token. The temporal multi-headed linear 12 𝐿𝑜𝑠𝑠 ← 𝐶𝑟𝑜𝑠𝑠𝐸𝑛𝑡𝑟𝑜𝑝𝑦𝐿𝑜𝑠𝑠(𝑜𝑢𝑡𝑝𝑢𝑡,𝑌𝑡𝑟𝑎𝑖𝑛 );
attention can learn correlations between tokens (time points) simulta- 13 𝑆𝑇 -𝑇 𝑟𝑎𝑛𝑠𝑓 𝑜𝑟𝑚𝑒𝑟.𝑢𝑝𝑑𝑎𝑡𝑒(𝑙𝑜𝑠𝑠)
neously. The detailed operation of the LSTMA unit can be illustrated 14 end
as, 𝑝𝑟𝑒𝑑
15 𝑌𝑡𝑒𝑠𝑡 ← 𝑆𝑇 -𝑇 𝑟𝑎𝑛𝑠𝑓 𝑜𝑟𝑚𝑒𝑟.𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝑋𝑡𝑒𝑠𝑡 );
( )
𝛷(𝑥) = 𝜑1 𝜑2 (𝑥)⊤ (6)
where 𝑥 refers to ROI-averaged time series, 𝜑1 refers to temporal multi-
headed linear attention, and 𝜑2 refers to spatial multi-headed linear we attempt to explore the possibility of identifying ASD subtypes. Due
attention. To ease the understanding of the ST-Transformer framework, to the paucity of ASD subtype fMRI data, we combine the fMRI data
the pseudo-code is shown in Algorithm 1. with ASD subtype labels from the ABIDE I and ABIDE II datasets.
Table 3 shows the number of each specific ASD subtype. Although
4.3. Gaussian GAN-based data balancing strategy for ASD subtypes
combining the ABIDE I and ABIDE II datasets increases the sample
ASD as a spectrum of psychiatric disorders can be further divided size of the ASD subtype fMRI data, the data volume for Asperger
into other subtypes, including autism, Asperger, and pervasive devel- and PDD-NOS is still small compared with autism. It will cause a
opmental disorder not otherwise specified (PDD-NOS). In this study, significant data imbalance problem, which makes the model biased to

5
X. Deng et al. Computers in Biology and Medicine 151 (2022) 106320

Fig. 3. Details of the GGDB.

Table 3 5. Experiments and results

Domains of impairment in ASD and sample numbers of ASD subtypes in ABIDE.
Autism Asperger PDD-NOS
In this section, we conduct a series of experiments to evaluate the
Social communication required required required
performance of our proposed model. The details of model settings and
Language required normal variable
Repetitive behaviors required required variable evaluation metrics are provided in Section 5.1. Specific experiments
Sample # in ABIDE 412 181 102 and corresponding results are described in Section 5.2. Section 5.3
carries out ablation experiments on our proposed model. The details
of the experiments for the ASD subtype are shown in Section 5.4. We
introduce the limitations of our proposed method and future work in
the majority class in the process of training. Generative adversarial
Section 5.5.
networks (GAN), as a deep learning method that can solve generative
modeling problems, have been successfully applied in a variety of
research tasks. In particular, in many small sample tasks, GAN is used 5.1. Model setting
for data augmentation to increase the sample size of a minority class. It
can efficiently alleviate the data imbalance phenomenon. GAN consists The model parameter settings for each component of the proposed
of two models, including a generator and a discriminator. These two method are shown in Table 4. We evaluate the performance of the
models are typically implemented using neural networks to map data proposed model by using a 10-fold cross-validation method to verify
from the initial feature space to the hidden representation space. The
the robustness of our proposed model and reduce the instability of the
generator tries to capture the distribution of true examples to generate
model prediction. To more comprehensively assess the performance of
new data examples. The discriminator is usually a binary classifier used
the proposed model, three commonly used metrics including accuracy
to discriminate generated examples from true examples as accurately as
(ACC), sensitivity (SEN), and specificity (SPE) are adopted in our
possible. Inspired by the GAN model, GGDB is proposed to increase the
sample amount of Asperger and PDD-NOS. To be specific, the Asperger experiments. Each of these metrics can be defined as follows:
and PDD-NOS samples are first expanded using the data augmentation 𝑇𝑃 + 𝑇𝑁
𝐴𝐶𝐶 = (8)
method in Section 3.2. Then, the minority class samples are further 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
enlarged using the GGDB method. In this way, it allows the Asperger
𝑇𝑃
and PDD-NOS samples to keep consistent with the autism samples using 𝑆𝐸𝑁 = (9)
𝑇𝑃 + 𝐹𝑁
a slicing strategy during the training of the model.
The architecture of GGDB is shown in Fig. 3. The generator structure 𝑇𝑁
𝑆𝑃 𝐸 = (10)
of GGDB consists of two different fully-connected layers. It needs to 𝑇𝑁 + 𝐹𝑃
note that the leaky rectified linear unit (ReLU) layer and batch nor- where 𝑇 𝑃 , 𝑇 𝑁, 𝐹 𝑃 , and 𝐹 𝑁 represent true positive, true negative,
malization are sequentially added between two fully connected layers. false positive, and false negative, respectively. For the test sample, ac-
Compared with GAN, GGDB adds Gaussian noise after the synthetic
curacy is assessed based on two kinds of accuracies: sequence accuracy
samples are generated by the generator, and then the discriminator
and subject accuracy. The ‘‘sequence accuracy’’ refers to the accuracy of
discriminates the synthetic data. Adding Gaussian noise aims to capture
each input augmented sequence. The ‘‘subject accuracy’’ is computed as
the distribution of the true samples as much as possible. It enables
the prediction by a simple majority voting of all input sequences from
the distribution of the generated data to be closer to the true sample
distribution. As for the structure of the discriminator, the model is a subject.
composed of two fully connected layers. Leaky ReLU and dropout To better evaluate the imbalanced fMRI data of the ASD subtype,
regularization is used for reducing overfitting after each layer. Sigmoid the 𝐹1 -score and the Macro-average accuracy (Marco-acc) are chosen
activation function is used for the last layer. The objective function of as evaluation metrics. These metrics are defined as follows,
GGDB can be expressed as follows, 2 × SEN × SPE
𝐹1 -score = (11)
( SEN + SPE )
min max E𝒙∼𝑝data [log 𝐷(𝒙)] + E𝒛∼𝑝𝒛 [log(1 − (𝐷(𝐺(𝒛) + 𝜏)))] (7)
𝐺 𝐷
𝐹1 -score1 + 𝐹1 -score2 + 𝐹1 -score3
where 𝐺 denotes the generator, 𝐷 is the discriminator. 𝑥 and 𝑧 repre- Macro-acc = (12)
3
sent the real samples and noise vectors, respectively. 𝑝𝑑𝑎𝑡𝑎 and 𝑝𝑧 denote
the distribution of true samples and generated samples, respectively. 𝐸 the 𝐹1 -score can be interpreted as a weighted average of the precision
is mathematical expectation. 𝜏 is a regular term, indicating that the and recall. 𝐹1 -score reaches its best value at 1 and worst score at 0. The
Gaussian noise is further added after the synthesized samples. Finally, Macro-average is the arithmetic mean of the performance metrics for
GGDB is only used in the training set to prevent the problem of data each class. In addition, we use a slice-based data augmentation strategy
leakage. in two different classification tasks.

6
X. Deng et al. Computers in Biology and Medicine 151 (2022) 106320

Table 4
Model parameter settings for ST-Transformer and GGDB.
For ST-transformer
Epochs 100 Optimizer(lr = 0.0001) Adam Batch Size 32
LSMA:head = 10 𝐷𝑘 = 𝐷𝑣 =9
Settings in LSTMA unit
LTMA:head = 8 𝐷𝑘 = 𝐷𝑣 =25
Configuration of final fully connected layers 204-2
Activation function to output layers Sigmoid
For GGDB
Epochs 80 Optimizer(lr = 0.0001) Adam Batch Size 64
Configuration of generator 200-2048-18000
Configuration of discriminator 18000-4096-1
Activation function to hidden layers LeakyRelu
1 LSMA denotes spatial multi-headed linear attention.
2 LTMA denotes temporal multi-headed linear attention.

5.2. Performance comparison 1DCNN is slightly better than LSTM on ABIDE II. In the Transformer-
based model, Transformer outperforms LSTM and 1DCNN for ASD
In this section, Section 5.2.1 describes in detail the experimental classification, where the accuracy of Transformer is 1%–2% higher
comparison of our proposed model with Transformer-based methods than LSTM and 1DCNN on both ABIDE I and ABIDE II. It might
and traditional deep learning methods. Several state-of-the-art methods be attributed to the fact that the multi-headed self-attention module
are compared with our proposed model in Section 5.2.2. of the Transformer-based method pays more attention to the global
information of the fMRI time series. The Transformer encoder is supe-
5.2.1. Comparison with different models rior to the Transformer decoder for ASD identification in Transformer
In this experiment, we compare our proposed ST-Trans former with variants. A possible reason for this is that the masked multi-headed
the following five methods, including three Transformer-based variants, self-attention module of the Transformer decoder blocks part of the
LSTM, and a one-dimensional convolutional neural network (1DCNN). information in the fMRI time series. Causing the Transformer decoder
The performance of the final model is validated on both ABIDE I and structure is not suitable for our task. Finally, we also compare the
ABIDE II datasets by 10-fold cross-validation. results with or without the demographic information. Among these
Transformer: Vanilla Transformer is the representative sequential methods, demographic information assists the fMRI data to improve the
models to address sequence data. In this experiment, vanilla Trans- classification performance of the models on both ABIDE I and ABIDE
former is constructed with eight headed self-attention blocks. A fully II.
connected layer with softmax activation function is concatenated to
perform ASD classification. Moreover, we select the encoder and de- 5.2.2. Comparison with state-of-the-art
coder as separate models for experimental comparison. In the decoder, In addition to a 10-fold CV, the existing methods also applied
we remove the cross-attention and keep only the masked self-attention independent sets of training/testing (IS) for performance evaluation.
part in the attention module. The other settings in the separate encoder However, there is a wide variation in the distribution of data from site
and decoder models are the same as the Transformer. to site with different equipment and parameter settings. IS is susceptible
LSTM: LSTM, as one of the most widely used classification models to the training/testing split and cannot well assess the reliability of
in neuroimaging analysis, has good results in traditional deep learning their models. In contrast, the 10-fold CV is more reliable to validate
methods for processing sequence data. In this experiment, the hidden the model performance.
size is 64. We add a fully connected layer after the LSTM for the final We summarize the state-of-the-art studies on the CAD methods for
ASD classification. ASD diagnosis in Tables 6 and 7. We also show the state-of-the-art
1DCNN: 1DCNN, as one of the traditional deep learning methods, method’s accuracy versus sample size in Fig. 4. Our proposed model
has been successfully applied in the field of CV. In this method, out shows superior performance in ABIDE I and ABIDE II. In comparison
channel is set to 128, size of the convolution kernel is 2, and the stride is with traditional machine learning approaches, deep learning-based
fixed to 1. To alleviate overfitting problem, we add batch normalization methods dominated the recent advance in mental disorder diagno-
after the convolution layer along with the maximum pooling. Finally, sis studies. Most deep learning-based methods outperform machine
a fully connected layer is concatenated for the diagnosis of ASD. learning-based methods on both ABIDE I and ABIDE II datasets con-
For a fair comparison, we generally keep the number of parame- ventionally. It could be attributed to the effectiveness of deep learning
ters of our proposed ST-Transformer consistent with the comparison to learn discriminative nonlinear feature representation. In addition,
algorithm. Furthermore, we combine the demographic information in previous studies, researchers tend to choose smaller datasets for the
with the output of each model separately to form models with prior lack of equipment the performance or the limitation of data collection.
information. Each model with demographic information is added to a Due to the heterogeneity problem caused by the increase of data, it
fully connected layer to achieve ASD identification. may cause the model performance to decrease with larger data samples.
We tabulate the classification performance obtained by different The classification performance obtained by our proposed model is not
methods on ABIDE I and ABIDE II after a 10-fold cross-validation in only better than all competitors, as well as the size of the data sample
Table 5. As shown in Table 5, our proposed model achieves reliable is larger than the majority of researchers. Generally speaking, the
accuracy of 71.01% (SPE: 70.01%, SEN: 72.02%) and 70.61%(SPE: proposed model can achieve competitive and robust results in ASD
72.27%, SEN: 68.75%) on ABIDE I and ABIDE II, respectively. It is prob- diagnosis.
ably due to the fact that our proposed model is able to effectively utilize
the spatio-temporal domain representation of fMRI data. Based on the 5.3. Ablation study
results of rank-sum test, our proposed model outperforms LSTM, CNN,
and Transformers on most evaluation metrics. Among the traditional In this section, we perform ablation experiments to validate the
deep learning methods, the accuracy obtained by LSTM and 1DCNN effectiveness of the proposed modules. Firstly, we make a comparison
are approximately the same on ABIDE I, and the performance of the between the proposed model with or without positional encoding, as

7
X. Deng et al.
Table 5
Performance compared with Transformer-based methods and traditional deep learning methods.
Model ABIDE I ABIDE II
seq_acc seq_sen seq_spe sub_acc sub_sen sub_spe seq_acc seq_sen seq_spe sub_acc sub_sen sub_spe
LSTM 64.39%* 63.09%* 65.69%* 66.34%* 67.33%* 65.35%* 63.88%* 60.90%* 68.09%* 64.84%* 60.11%* 69.03%*
LSTM(w/pheno) 65.68% 62.31%* 68.86% 67.73%* 66.50%* 68.96% 64.48%* 60.52%* 68.02%* 66.07%* 64.92%* 67.09%*
1DCNN 64.64%* 62.35%* 66.82% 66.30%* 66.51%* 66.11%* 64.03%* 61.01%* 66.72%* 66.36%* 66.54% 66.19%*
1DCNN(w/pheno) 65.43%* 62.84%* 68.01% 68.18% 66.30%* 69.96% 65.49% 60.71% 69.76% 68.14% 64.54%* 71.37%
Transformer 65.61%* 64.21%* 67.02% 68.09%* 68.98%* 67.20%* 65.48% 62.19% 68.76% 67.81% 66.55% 69.07%*
Transformer(w/pheno) 66.46% 65.18% 67.74% 68.63% 68.34%* 68.92% 66.06% 63.23% 68.88% 68.44% 67.35% 69.54%
Transformer encoder 65.60%* 61.96%* 69.23% 67.63%* 66.97%* 68.29% 65.37% 62.35% 68.38%* 67.38%* 65.15%* 69.60%
Transformer encoder(w/pheno) 66.21% 64.76%* 67.57% 68.63% 68.55%* 68.71% 65.87% 63.71% 68.03%* 68.72% 69.14% 68.33%*
8

Transformer decoder 64.39%* 62.34%* 66.28%* 67.79%* 70.14% 65.48%* 62.42%* 58.78%* 65.67%* 64.84%* 63.93%* 65.66%*
Transformer decoder(w/pheno) 65.63%* 64.81%* 66.35%* 68.18% 68.74%* 67.57% 63.69%* 55.80%* 70.74% 66.07%* 59.93%* 71.56%
ST-Transformer\LA 66.67%(± 4.3%) 63.84% 69.49% 69.18%(± 5.9%) 67.33% 70.91% 66.41%(± 4.0%) 63.07% 69.38% 68.71%(± 4.3%) 68.54% 68.86%
ST-Transformer\LA(w/pheno) 67.25%(± 3.8%) 64.90% 69.46% 69.77%(± 5.6%) 69.56% 69.94% 67.20%(± 4.6%) 63.05% 70.91% 69.28%(± 5.6%) 66.33% 71.91%
ST-Transformer 67.92%(± 3.8%) 65.55% 70.15% 70.46%(± 4.8%) 70.76% 70.14% 67.61%(± 4.4%) 62.86% 71.83% 69.85%(± 4.7%) 66.53% 72.79%
ST-Transformer(w/pheno) 68.56%(± 3.1%) 67.85% 69.26% 71.01%(± 3.9%) 72.02% 70.01% 68.09%(± 3.7%) 64.26% 71.51% 70.61%(± 3.6%) 68.75% 72.27%
1 w/pheno denotes adding demographic information to the model.
2
seq_acc, seq_sen, and seq_spe refer to sequence accuracy, sequence sensitivity, and sequence specificity, respectively.
3
sub_acc, sub_sen, and sub_spe refer to subject accuracy, subject sensitivity, and subject specificity, respectively.
4
The ST-Transformer\LA method is another version of our proposed method, which uses self-attention instead of linear attention.
5
The numbers in () represent the standard deviation.
*p-value ≤ 0.05.

Computers in Biology and Medicine 151 (2022) 106320

X. Deng et al. Computers in Biology and Medicine 151 (2022) 106320

Table 6
Performance compared with previous literature on ABIDE I.
Method Classifier Validation Sample# Accuracy
Ours ST-Transformer 10-fold CV 1009 71.01%
Yang 2022[43] kSVM 5-fold CV 871 69.43%
Almuqhim 2021[44] SAE+DNN 10-fold CV 1035 70.80%
Abdelbasset 2020[45] SVC intra-site CV 172 70.36%
Shahamat 2020[46] 3D-CNN 5-fold CV 1000 70.00%
You 2020[47] CNN 10-fold CV 106 68.54%
Bengs 2019[48] convGRU-CNN3D IS 194 67.00%
El-Gazzar 2019[49] 1DCNN 5-fold CV 1100 64.00%
Heinsfeld 2018[34] AE+ANN 10-fold CV 1035 70.00%
Dvornek 2018[50] RawPhenotype-LSTM 10-fold CV 1100 70.10%
Dvornek 2017[51] LSTM 10-fold CV 1100 68.50%
Rane 2017[52] LR 5-fold CV 1112 62.00%

Fig. 4. Performance compared with previous literature on ABIDE I&II.

Table 7 the reliance of deep learning methods on large sample data. Fourthly,
Performance compared with previous literature on ABIDE II.
we conducted experiments with our proposed model on the combined
Method Classifier Validation Sample# Accuracy
ABIDE (ABIDE I & II) dataset. From Table 8 section D we can see that
Ours ST-Transformer 10-fold CV 1058 70.61% the experimental performance is slightly degraded compared to that on
Liu 2021[53] BL 10-fold CV 1043 65.29%
ABIDE I and ABIDE II separately. This is probably due to the fact that
Chen 2020[54] HBM 10-fold CV 250 65.00%
Aghdam 2019[55] MCNNEs 10-fold CV 343 70.00% as the data sample increases, more samples about ASD heterogeneity
Zhao 2019[56] CAE+CNN IS 693 65.30% emerge. This makes it difficult for our model to discover potential
biomarkers for ASD diagnosis.
Furthermore, we carry out experiments with our proposed model
only in the temporal domain and only in the spatial domain with
seen in Table 8 section A, the classification accuracy is reduced by 1%–
multi-headed linear attention, respectively. As Fig. 5 shows, our pro-
2% when positional encoding is added compared to that without posi-
posed model makes good use of both temporal and spatial feature
tional encoding. The results reveal that although position-coding can
information, and the classification accuracies obtained by the model
provide position information, the input fMRI data is already sequential
are better than those obtained by multi-headed linear attention in the
time-series data that implicitly contain position information. Adding
temporal domain only, or in the spatial domain only. To explore the
additional positional embedding will reinforce the location information
structure of the proposed ST-Transformer, we compare the differences
and interfere with the attention mechanism for learning the discrimi-
native feature representation. Secondly, we evaluate the performance in the number of layers of the proposed model encoder. As shown in
of our proposed linear attention mechanism in terms of the accuracy Fig. 6, when the number of layers is higher, the classification accuracy
and time efficiency of training process. As shown in Table 8 section obtained is lower. This is probably owing to the fact that the scale of
B, our proposed model achieves higher accuracy than ST-Transformer our dataset is relatively small compared to the CV and NLP domains,
without using linear attention (noted as ST-Transformer\LA) on both and a significant overfitting phenomenon occurs when the number of
ABIDE I and ABIDE II, and the time for model training is 29% less. layers is increased.
Thirdly, we performed experiments on the proposed method with or
without the data augmentation strategy, to address the necessity of data 5.4. Identifying ASD subtypes
augmentation method for fMRI. As shown in Table 8 section C, our
proposed method achieves a significant improvement in ASD diagnosis In this section, we put the proposed GGDB strategy under our ST-
performance when using the data augmentation. It might be due to Transformer model for experiments. To verify the effectiveness of our

9
X. Deng et al. Computers in Biology and Medicine 151 (2022) 106320

Table 8
The result of Ablation study.
Method ABIDE I ABIDE II
Section A: Performance comparison to evaluate the effectiveness of positional encoding.
seq_acc seq_sen seq_spe sub_acc sub_sen sub_spe seq_acc seq_sen seq_spe sub_acc sub_sen sub_spe
ST-Transformer(w/pe) 67.58% 66.91% 68.21% 69.47% 70.58% 68.40% 65.85% 62.89% 68.49% 68.53% 68.75% 68.33%
ST-Transformer(w/o_pe) 68.56% 67.85% 69.26% 71.01% 72.02% 70.01% 68.09% 64.26% 71.51% 70.61% 68.75% 72.27%
Section B: Performance comparison to evaluate the effectiveness of our proposed linear attention mechanism.
sub_acc Time (h) sub_acc Time (h)
ST-Transformer\LA(w/pheno) 69.77% 1.8 69.28% 2.1
ST-Transformer(w/pheno) 71.01% 1.3 70.61% 1.7
Section C: Performance comparison to evaluate the effectiveness of data augmentation.
sub_acc sub_sen sub_spe sub_acc sub_sen sub_spe
ST-Transformer(w/o_da) 65.02% 65.89% 64.14% 65.32% 61.75% 68.52%
ST-Transformer(w/da) 71.01% 72.02% 70.01% 70.61% 68.75% 72.27%
Section D: The results of our method on ABIDE I & II.
Dataset seq_acc seq_sen seq_spe sub_acc sub_sen sub_spe
ABIDE I 68.56% 67.85% 69.26% 71.01% 72.02% 70.01%
ST-Transformer(w/pheno) ABIDE II 68.09% 64.26% 71.51% 70.61% 68.75% 72.27%
ABIDE I&II 66.80% 64.72% 68.72% 69.24% 68.75% 69.72%
1
w/pe, w/o_pe denotes model with or without positional encoding, respectively.
2
w/da, w/o_da denotes our method with or without data augmentation, respectively.

Table 9
The result of using data balancing strategies.
Balancing strategy Autism Asperger PDD-NOS Total Autism Asperger PDD-NOS Total
seq_𝐹1 -score seq_Macro-acc sub_𝐹1 -score sub_Macro-cc
w/o_db 75.90% 46.47% 52.10% 58.16% 77.51% 48.41% 55.87% 60.60%
Slicing 74.59% 45.18% 57.96% 59.24% 75.11% 50.19% 58.56% 61.29%
Gaussian 74.97% 46.58% 55.83% 59.13% 77.20% 49.48% 57.86% 61.51%
GAN 74.78% 49.35% 54.47% 59.53% 76.73% 50.40% 60.28% 62.47%
GGDB 75.94% 52.04% 54.07% 60.68% 77.88% 55.60% 60.11% 64.53%
1
w/o_db means that no data balancing strategy is used.
2
seq_𝐹1 -score , seq_Macro-acc, sub_𝐹1 -score , and sub_Macro-cc represents sequence 𝐹1 -score , sequence Macro-average, subject 𝐹1 -score and subject Macro-average, respectively.

is 64.53%. The results are better than the other three methods. This
is because that GGDB is able to learn the hidden representation and
distribution of original data for generating additional data samples.
It can also be seen that all four methods for dealing with data im-
balance exhibit the biased predictive capability to varying degrees.
𝐹1 -score is higher for the majority class samples and relatively lower
for the minority class samples. These results suggest that our proposed
method addresses the data imbalance problem in ASD subtype diagno-
sis to a certain extent and achieves effective performance over other
methods.
In Fig. 7, we plot the distributions of the real and generated samples
by PCA (Principal Component Analysis) to further analyze the quality
of GGDB generated data. PCA is commonly used method to convert
features from high-dimensional space into a two-dimensional plane
for better visualization. Polynomial regression (PR) is also used to
fit the two-dimensional points for analysing the distribution of real
and generated data from Asperger and PDD-NOS. The PR function
Fig. 5. Performance comparison to evaluate the effectiveness of LSTMA unit on ABIDE approximates the boundaries of real and generated data from Asperger
I&II. and PDD-NOS with different colored curves. According to the curves
plotted by PR functions, the fitted boundaries of the real data and the
generated data corresponding to the two sets of classes are very close,
proposed method, we compare our method with three common used which demonstrate the high quality of our generated data. Besides,
data augmentation methods including slicing, adding Gaussian noise compared with real data, the generated data has large confidence inter-
and GAN. It is worth mentioning that these three strategies for handling val (the shadow region of polynomial curves) which helps to enlarge
data imbalance are also used in the training set only. Table 9 shows the the decision boundaries on the hyperplane for the classifier. In other
results of the GGDB method with the other three methods for identify- words, The data generated by GGDB contains not only sample points
ing ASD subtype fMRI data. As we can see from Table 9, our proposed near the original sample distribution, but also data samples outside the
method achieves 77.88%, 55.60%, 60.11% 𝐹1 -score on autism, As- original sample distribution. Therefore, the GGDB is able to lead to
perger, and PDD-NOS, respectively. Its corresponding Macro-average better performance and robustness of the classifier.

10
X. Deng et al. Computers in Biology and Medicine 151 (2022) 106320

Fig. 6. Performance comparison to evaluate the effectiveness of layers on ABIDE I&II.

atlases to increase the sample amount required for deep learning model
training. Second, due to the different scanning machines and parameter
settings at each ABIDE site, there may be domain gaps in fMRI data
collected from different sites. In the following work, transfer learning
can be adopted in our framework to address the domain gap problem to
further improve our classification performance. Third, although GGDB
is able to learn hidden representations and distributions of the original
data to generate additional data, GGDB exhibits a large bias on the
𝐹1 -score . We will aim to improve GGDB to address this issue.

6. Conclusion

In this work, an effective ST-Transformer deep learning framework

has been presented for the identification of ASD from fMRI data. Specif-
ically, an LSTMA unit has been proposed to capture the information in
the spatio-temporal domain of fMRI data while accelerating the training
process of the model. In addition, to further identify ASD subtypes, a
GGDB strategy has been presented to solve the data imbalance problem
of ASD subtypes fMRI data. The proposed model effectively solves the
problem that traditional time series-based deep learning methods are
prone to lose spatial information. Through a series of simulation ex-
periments in ABIDE, the effectiveness and reliability of ST-Transformer
has been demonstrated.

Fig. 7. Two-dimensional PCA visualizations of the real and generated feature from
Declaration of competing interest
Asperger and PDD-NOS.

The authors declare that they have no known competing finan-

cial interests or personal relationships that could have appeared to
5.5. Limitations and future work
influence the work reported in this paper.

Our proposed method exhibits certain competitive features. Firstly,

References
compared to FC-based methods, our approach does not need to design
complex feature extraction algorithms which is an end-to-end deep
[1] K. Lyall, L. Croen, J. Daniels, M.D. Fallin, C. Ladd-Acosta, B.K. Lee, B.Y. Park,
learning framework. Secondly, compared with the time series-based N.W. Snyder, D. Schendel, H. Volk, et al., The changing epidemiology of autism
approach, there is no limitation to focusing on a single dimension spectrum disorders, Annu. Rev. Public Health 38 (2017) 81–102.
of information. The proposed model effectively solves the problem [2] R.E. Nickel, L. Huang-Storms, Early identification of young children with autism
that traditional time series-based deep learning methods are prone to spectrum disorder, Indian J. Pediatr. 84 (2017) 53–60.
[3] T. Chung, J. Cornelius, D. Clark, C. Martin, Greater prevalence of proposed ICD-
lose spatial information. Finally, demographic information is added
11 alcohol and cannabis dependence compared to ICD-10, DSM-IV, and DSM-5
to the model to further improve the diagnosis performance of ASD. in treated adolescents, Alcohol. Clin. Exp. Res. 41 (9) (2017) 1584–1592.
The current work is expected to provide insights into the development [4] B. Crosson, A. Ford, K.M. McGregor, M. Meinzer, S. Cheshkov, X. Li, D. Walker-
of effective CAD methods for ASD diagnosis. While our proposed ST- Batson, R.W. Briggs, Functional imaging and related techniques: an introduction
Transformer performs well in ASD diagnosis, several limitations should for rehabilitation researchers, J. Rehabil. Res. Dev. 47 (2010) vii.
[5] M. Greicius, Resting-state functional connectivity in neuropsychiatric disorders,
be carefully addressed to improve its performance and practical values.
Curr. Opin. Neurol. 21 (2008) 424–430.
First, although our data volume is relatively large compared to [6] X. Li, N.C. Dvornek, J. Zhuang, P. Ventola, J. Duncan, Graph embedding using
previous literature, our study is still a small sample for deep learning infomax for ASD classification and brain functional difference detection, 11317,
models. In the future, we plan to combine fMRI data with multiple 2020, 1131702.

11
X. Deng et al. Computers in Biology and Medicine 151 (2022) 106320

[7] Y. Kong, J. Gao, Y. Xu, Y. Pan, J. Wang, J. Liu, Classification of autism spectrum [34] A.S. Heinsfeld, A.R. Franco, R.C. Craddock, A. Buchweitz, F. Meneguzzi, Identi-
disorder by combining brain connectivity and deep neural network classifier, fication of autism spectrum disorder using deep learning and the ABIDE dataset,
Neurocomputing 324 (2019) 63–68. NeuroImage: Clinical 17 (2018) 16–23.
[8] V. Mnih, N. Heess, A. Graves, et al., Recurrent models of visual attention, in: [35] T. Eslami, V. Mirjalili, A. Fong, A.R. Laird, F. Saeed, ASD-DiagNet: A hybrid
Advances in Neural Information Processing Systems, 2014, pp. 2204–2212. learning approach for detection of autism spectrum disorder using fMRI data,
[9] J.F. DeRose, J. Wang, M. Berger, Attention flows: Analyzing and comparing Front. Neuroinform. 13 (2019) 70.
attention mechanisms in language models, IEEE Trans. Vis. Comput. Graphics [36] R. Liu, Z.-a. Huang, M. Jiang, K.C. Tan, Multi-LSTM networks for accurate
27 (2) (2021) 1160–1170. classification of attention deficit hyperactivity disorder from resting-state fMRI
[10] S. Kitada, H. Iyatomi, Attention meets perturbations: Robust and interpretable data, in: 2020 2nd International Conference on Industrial Artificial Intelligence,
attention with adversarial training, IEEE Access 9 (2021) 92974–92985. IAI, 2020, pp. 1–6.
[11] G. Liu, J. Guo, Bidirectional LSTM with attention mechanism and convolutional [37] N.C. Dvornek, P. Ventola, K.A. Pelphrey, J.S. Duncan, Identifying autism from
layer for text classification, Neurocomputing 337 (2019) 325–338. resting-state fMRI using long short-term memory networks, in: Q. Wang, Y.
[12] A. Roy, M. Saffar, A. Vaswani, D. Grangier, Efficient content-based sparse Shi, H.-I. Suk, K. Suzuki (Eds.), Machine Learning in Medical Imaging, Springer
attention with routing transformers, Trans. Assoc. Comput. Linguist. 9 (2021) International Publishing, Cham, 2017, pp. 362–370.
53–68. [38] K. Byeon, J. Kwon, J. Hong, H. Park, Artificial neural network inspired by
[13] M. Chen, A. Radford, R. Child, J. Wu, H. Jun, D. Luan, I. Sutskever, Generative neuroimaging connectivity: Application in autism spectrum disorder, in: 2020
pretraining from pixels, 2020, pp. 1691–1703. IEEE International Conference on Big Data and Smart Computing (BigComp),
[14] S. Woo, J. Park, J.-Y. Lee, I.S. Kweon, Cbam: Convolutional block attention 2020, pp. 575–578.
module, in: Proceedings of the European Conference on Computer Vision, ECCV, [39] M.A. Bayram, Ö. İlyas, F. Temurtaş, Deep learning methods for autism spectrum
2018, pp. 3–19. disorder diagnosis based on fMRI images, Sakarya Univ. J. Comput. Inf. Sci. 4
[15] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, (1) (2021) 142–155.
End-to-end object detection with transformers, 2020, pp. 213–229. [40] C. Craddock, Y. Benhajali, C. Chu, F. Chouinard, A. Evans, A. Jakab, B.S.
[16] N. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, N. Shazeer, A. Ku, D. Tran, Image Khundrakpam, J.D. Lewis, Q. Li, M. Milham, et al., The neuro bureau preprocess-
transformer, 2018, pp. 4055–4064. ing initiative: open sharing of preprocessed neuroimaging data and derivatives,
[17] X. Chen, Y. Wu, Z. Wang, S. Liu, J. Li, Developing real-time streaming Front. Neuroinform. 7 (2013).
transformer transducer for speech recognition on large-scale dataset, in: ICASSP [41] C.-G. Yan, X.-D. Wang, X.-N. Zuo, Y.-F. Zang, DPABI: data processing & analysis
2021-2021 IEEE International Conference on Acoustics, Speech and Signal for (resting-state) brain imaging, Neuroinformatics 14 (3) (2016) 339–351.
Processing, ICASSP, IEEE, 2021, pp. 5904–5908. [42] A.D. Rasamoelina, F. Adjailia, P. Sinčák, A review of activation function for
[18] L. Dong, S. Xu, B. Xu, Speech-transformer: a no-recurrence sequence-to-sequence artificial neural network, in: 2020 IEEE 18th World Symposium on Applied
model for speech recognition, in: 2018 IEEE International Conference on Machine Intelligence and Informatics, SAMI, 2020, pp. 281–286.
Acoustics, Speech and Signal Processing, ICASSP, IEEE, 2018, pp. 5884–5888. [43] X. Yang, N. Zhang, P. Schrader, A study of brain networks for autism spectrum
[19] Y. Qiu, S. Yu, Y. Zhou, D. Liu, X. Song, T. Wang, B. Lei, Multi-channel sparse disorder classification using resting-state functional connectivity, Mach. Learn.
graph transformer network for early alzheimer’s disease identification, in: 2021 Appl. 8 (2022) 100290.
IEEE 18th International Symposium on Biomedical Imaging, ISBI, 2021, pp. [44] F. Almuqhim, F. Saeed, ASD-SAENet: A sparse autoencoder, and deep-neural
1794–1797. network model for detecting autism spectrum disorder (ASD) using fMRI data,
[20] T. Zhang, C. Li, P. Li, Y. Peng, X. Kang, C. Jiang, F. Li, X. Zhu, D. Yao, B. Front. Comput. Neurosci. 15 (2021) 27.
Biswal, P. Xu, Separated channel attention convolutional neural network (SC- [45] A. Brahim, N. Farrugia, Graph Fourier transform of fMRI temporal signals based
CNN-attention) to identify ADHD in multi-site rs-fMRI dataset, Entropy 22 (8) on an averaged structural connectome for the classification of neuroimaging,
(2020). Artif. Intell. Med. 106 (2020) 101870.
[21] C. Yang, P. Wang, J. Tan, Q. Liu, X. Li, Autism spectrum disorder diagnosis [46] H. Shahamat, M.S. Abadeh, Brain MRI analysis using a deep learning based
using graph attention network based on spatial-constrained sparse functional evolutionary approach, Neural Netw. 126 (2020) 218–234.
brain networks, Comput. Biol. Med. 139 (2021) 104963. [47] Y. You, H. Liu, S. Zhang, L. Shao, Classification of autism based on fMRI
[22] W. Yin, L. Li, F.-X. Wu, A graph attention neural network for diagnosing ASD data with feature-fused convolutional neural network, in: Cyberspace Data and
with fMRI data, in: 2021 IEEE International Conference on Bioinformatics and Intelligence, and Cyber-Living, Syndrome, and Health, Springer, 2020, pp. 77–88.
Biomedicine, BIBM, IEEE, 2021, pp. 1131–1136. [48] M. Bengs, N.T. Gessert, A. Schlaefer, 4D spatio-temporal deep learning with 4D
[23] K. Niu, J. Guo, Y. Pan, X. Gao, X. Peng, N. Li, H. Li, Multichannel deep fMRI data for autism spectrum disorder classification, in: Medical Imaging with
attention neural networks for the classification of autism spectrum disorder using Deep Learning, MIDL 2019 Conference, 2019, pp. 1–4.
neuroimaging and personal characteristic data, Complexity 2020 (2020). [49] A. El-Gazzar, M. Quaak, L. Cerliani, P. Bloem, G.v. Wingen, R. Mani Thomas, A
[24] C. Wang, Z. Xiao, J. Wu, Functional connectivity-based classification of autism hybrid 3DCNN and 3DC-LSTM based model for 4D spatio-temporal fMRI data: an
and control using SVM-RFECV on rs-fMRI data, Phys. Medica 65 (2019) 99–105. ABIDE autism classification study, in: OR 2.0 Context-Aware Operating Theaters
[25] F. Sadeghian, H. Hasani, M. Jafari, Feature selection based on genetic algorithm and Machine Learning in Clinical Neuroimaging, Springer, 2019, pp. 95–102.
in the diagnosis of autism disorder by fMRI, Casp. J. Neurol. Sci. 7 (2) (2021) [50] N.C. Dvornek, P. Ventola, J.S. Duncan, Combining phenotypic and resting-state
74–83. fMRI data for autism classification with recurrent neural networks, in: 2018 IEEE
[26] N. Wang, D. Yao, L. Ma, M. Liu, Multi-site clustering and nested feature 15th International Symposium on Biomedical Imaging (ISBI 2018), IEEE, 2018,
extraction for identifying autism spectrum disorder with resting-state fMRI, Med. pp. 725–728.
Image Anal. 75 (2022) 102279. [51] N.C. Dvornek, P. Ventola, K.A. Pelphrey, J.S. Duncan, Identifying autism from
[27] S.Y. Yap, W.H. Chan, Elastic SCAD SVM cluster for the selection of significant resting-state fMRI using long short-term memory networks, 2017, pp. 362–370.
functional connectivity in autism spectrum disorder classification, Acad. Fundam. [52] S. Rane, E. Jolly, A. Park, H. Jang, C. Craddock, Developing predictive imaging
Comput. Res. 1 (2) (2020). biomarkers using whole-brain classifiers: Application to the ABIDE I dataset, Res.
[28] X. Ma, X.-H. Wang, L. Li, Identifying individuals with autism spectrum disorder Ideas and Outcomes 3 (2017) e12733.
based on the principal components of whole-brain phase synchrony, Neurosci. [53] J.-c. Liu, J.-z. Ji, Classification method of fMRI data based on broad learning
Lett. 742 (2021) 135519. system, J. ZheJiang Univ. (Engineering Science) 55 (7) 1270–1278.
[29] G. Wen, P. Cao, H. Bao, W. Yang, T. Zheng, O. Zaiane, MVS-GCN: A prior [54] T. Chen, Y. Chen, M. Yuan, M. Gerstein, T. Li, H. Liang, T. Froehlich, L. Lu,
brain structure learning-guided multi-view graph convolution network for autism et al., The development of a practical artificial intelligence tool for diagnosing
spectrum disorder diagnosis, Comput. Biol. Med. (2022) 105239. and evaluating autism spectrum disorder: multicenter study, JMIR Med. Inform.
[30] A. Loddo, S. Buttau, C. Di Ruberto, Deep learning based pipelines for Alzheimer’s 8 (5) (2020) e15767.
disease diagnosis: A comparative study and a novel deep-ensemble method, [55] M.A. Aghdam, A. Sharifi, M.M. Pedram, Diagnosis of autism spectrum disorders
Comput. Biol. Med. (2021) 105032. in young children based on resting-state functional magnetic resonance imaging
[31] H. Jiang, P. Cao, M. Xu, J. Yang, O. Zaiane, Hi-GCN: A hierarchical graph data using convolutional neural networks, J. Digit. Imaging 32 (6) (2019)
convolution network for graph embedding learning of brain network and brain 899–918.
disorders prediction, Comput. Biol. Med. 127 (2020) 104096. [56] Y. Zhao, H. Dai, W. Zhang, F. Ge, T. Liu, Two-stage spatial temporal deep
[32] A. Puente-Castro, E. Fernandez-Blanco, A. Pazos, C.R. Munteanu, Automatic learning framework for functional brain network modeling, in: 2019 IEEE 16th
assessment of Alzheimer’s disease diagnosis based on deep learning techniques, International Symposium on Biomedical Imaging (ISBI 2019), IEEE, 2019, pp.
Comput. Biol. Med. 120 (2020) 103764. 1576–1580.
[33] M. Leming, J.M. Górriz, J. Suckling, Ensemble deep learning on large, mixed-
site fMRI datasets in autism and other tasks, Int. J. Neural Syst. 30 (07) (2020)
2050012.

Path of The Magi
No ratings yet
Path of The Magi
138 pages
Ma6351 Transforms and Partial Differential Equations PDF
100% (3)
Ma6351 Transforms and Partial Differential Equations PDF
204 pages
Flanklin Motors For Water System
No ratings yet
Flanklin Motors For Water System
64 pages
Future Doctor Career Launchpad
No ratings yet
Future Doctor Career Launchpad
16 pages
Smarth RMBS
No ratings yet
Smarth RMBS
10 pages
Physics Term Paper Examples
100% (2)
Physics Term Paper Examples
7 pages
Example of A Dissertation Problem Statement
100% (1)
Example of A Dissertation Problem Statement
8 pages
Cyber Bullying Essay
100% (4)
Cyber Bullying Essay
8 pages
Advancing Early Autism Detection in Children Through Machine Learning-Assisted Spectrogram Analysis
No ratings yet
Advancing Early Autism Detection in Children Through Machine Learning-Assisted Spectrogram Analysis
5 pages
5565 12871 1 PB
No ratings yet
5565 12871 1 PB
21 pages
Birthday Treats List - XLSX - Sheet1
No ratings yet
Birthday Treats List - XLSX - Sheet1
6 pages
Autism Spectrum Disorder Classification With Interpretability in Children Based On Structural MRI Features Extracted Using Contrastive Variational Autoencoder
No ratings yet
Autism Spectrum Disorder Classification With Interpretability in Children Based On Structural MRI Features Extracted Using Contrastive Variational Autoencoder
13 pages
Analysis of Brain Imaging Data For The Detection of Early Age Autism Spectrum Disorder Using Transfer Learning Approaches For Internet of Things
No ratings yet
Analysis of Brain Imaging Data For The Detection of Early Age Autism Spectrum Disorder Using Transfer Learning Approaches For Internet of Things
12 pages
Autism Spectrum Disorder Identification With Multi-Site Functional Magnetic Resonance Imaging
No ratings yet
Autism Spectrum Disorder Identification With Multi-Site Functional Magnetic Resonance Imaging
12 pages
Autism Spectrum Disorder Detection Using Projection Based Learning Meta-Cognitive RBF Network
No ratings yet
Autism Spectrum Disorder Detection Using Projection Based Learning Meta-Cognitive RBF Network
8 pages
Module 6 Composites v2.0
No ratings yet
Module 6 Composites v2.0
62 pages
Advanced Methodologies Resolving Dimensionality Complications For Autism Neuroimaging Dataset: A Comprehensive Guide For Beginners
No ratings yet
Advanced Methodologies Resolving Dimensionality Complications For Autism Neuroimaging Dataset: A Comprehensive Guide For Beginners
10 pages
Epicure 11 2024 Freemagazines Top
No ratings yet
Epicure 11 2024 Freemagazines Top
88 pages
A Multimodal Approach For Identifying Autism Spectrum Disorders in Children
No ratings yet
A Multimodal Approach For Identifying Autism Spectrum Disorders in Children
9 pages
Trace Element Geochemistry of Iron Oxy Hydroxides in Ni Co - 2022 - Ore Geol
No ratings yet
Trace Element Geochemistry of Iron Oxy Hydroxides in Ni Co - 2022 - Ore Geol
29 pages
Autism
No ratings yet
Autism
16 pages
ASD Diaget
No ratings yet
ASD Diaget
11 pages
Session I 18-12-2020 Research Seminar
No ratings yet
Session I 18-12-2020 Research Seminar
48 pages
Devika 2021
No ratings yet
Devika 2021
6 pages
IET Electrical Systems in Transportation
No ratings yet
IET Electrical Systems in Transportation
12 pages
NTE UI AchievementTest1
100% (1)
NTE UI AchievementTest1
5 pages
Facial Image-Based Autism Detection A Comparative Study of Deep Neural
100% (1)
Facial Image-Based Autism Detection A Comparative Study of Deep Neural
22 pages
Literature Review
No ratings yet
Literature Review
2 pages
Jurnal Teknologi: Functional Magnetic Resonance Imaging FOR Autism Spectrum Disorder Detection Using Deep Learning
No ratings yet
Jurnal Teknologi: Functional Magnetic Resonance Imaging FOR Autism Spectrum Disorder Detection Using Deep Learning
8 pages
Similarity Report-Muthulakshmi S
No ratings yet
Similarity Report-Muthulakshmi S
19 pages
Sood RMBS BiomedSci 2024+ver+2
No ratings yet
Sood RMBS BiomedSci 2024+ver+2
11 pages
Springer Book Chapter
No ratings yet
Springer Book Chapter
26 pages
Lab Manual - BEEE - Winter - 24 - 25 PDF
No ratings yet
Lab Manual - BEEE - Winter - 24 - 25 PDF
44 pages
Deep Learning For Neuroimaging-Based Diagnosis and Rehabilitation of Autism Spectrum Disorder: A Review
No ratings yet
Deep Learning For Neuroimaging-Based Diagnosis and Rehabilitation of Autism Spectrum Disorder: A Review
17 pages
Autism Paper
No ratings yet
Autism Paper
10 pages
Pendekatan Machine Learning Untuk Mendeteksi ASD Di Anak
No ratings yet
Pendekatan Machine Learning Untuk Mendeteksi ASD Di Anak
12 pages
FS-4005A July 2011
No ratings yet
FS-4005A July 2011
19 pages
Electrical Machines II LM
No ratings yet
Electrical Machines II LM
62 pages
Cit 14
No ratings yet
Cit 14
15 pages
Eleanor DC Menu
No ratings yet
Eleanor DC Menu
2 pages
Model For Autism Disorder Detection Using Deep Learning
No ratings yet
Model For Autism Disorder Detection Using Deep Learning
8 pages
Duranti Concept of Appraisal and Archival Theory
No ratings yet
Duranti Concept of Appraisal and Archival Theory
18 pages
Unit V Synchnronous Machines
No ratings yet
Unit V Synchnronous Machines
52 pages
Journal of Electrical Engineering
No ratings yet
Journal of Electrical Engineering
20 pages
ETRI Journal - 2022 - Elshoky - Comparing Automated and Non Automated Machine Learning For Autism Spectrum Disorders
No ratings yet
ETRI Journal - 2022 - Elshoky - Comparing Automated and Non Automated Machine Learning For Autism Spectrum Disorders
11 pages
Asd-Diagnet: A Hybrid Learning Approach For Detection of Autism Spectrum Disorder Using Fmri Data
No ratings yet
Asd-Diagnet: A Hybrid Learning Approach For Detection of Autism Spectrum Disorder Using Fmri Data
8 pages
Three Project Managers With Distinctly Different Roles
No ratings yet
Three Project Managers With Distinctly Different Roles
12 pages
2.ASD Transformer
No ratings yet
2.ASD Transformer
24 pages
Made Asd
No ratings yet
Made Asd
10 pages
IBS Term Paper
No ratings yet
IBS Term Paper
13 pages
Control System
0% (2)
Control System
1 page
Autism Research and Treatment - 2023 - Koc - Autism Spectrum Disorder Detection by Hybrid Convolutional Recurrent Neural
No ratings yet
Autism Research and Treatment - 2023 - Koc - Autism Spectrum Disorder Detection by Hybrid Convolutional Recurrent Neural
12 pages
Human Brain Mapping - 2022 - Ma - Abnormal Amygdala Functional Connectivity and Deep Learning Classification in
No ratings yet
Human Brain Mapping - 2022 - Ma - Abnormal Amygdala Functional Connectivity and Deep Learning Classification in
11 pages
Chapter 3 Ethics
No ratings yet
Chapter 3 Ethics
8 pages
Art Lesson Plan - Watercolor
No ratings yet
Art Lesson Plan - Watercolor
11 pages
Crash Course ACM II. Lecture 1
No ratings yet
Crash Course ACM II. Lecture 1
24 pages
Chen 2015
No ratings yet
Chen 2015
8 pages
A Novel Cascaded Multilevel Inverter Using Reverse Voltage Topology
No ratings yet
A Novel Cascaded Multilevel Inverter Using Reverse Voltage Topology
6 pages
Deep Multimodal Learning For The Diagnosis of Auti
No ratings yet
Deep Multimodal Learning For The Diagnosis of Auti
11 pages
Cit 3
No ratings yet
Cit 3
9 pages
Neuroimage: Clinical: Sciencedirect
No ratings yet
Neuroimage: Clinical: Sciencedirect
8 pages
1 s2.0 S2213158217302073 Main
No ratings yet
1 s2.0 S2213158217302073 Main
8 pages
TRIM56 protects against non-alcoholic fatty liver disease via promoting the degradation of fatty acid synthase (科研通-ablesci.com)
No ratings yet
TRIM56 protects against non-alcoholic fatty liver disease via promoting the degradation of fatty acid synthase (科研通-ablesci.com)
51 pages
Haweel 2019
No ratings yet
Haweel 2019
5 pages
Vayetze - The Ladder of Jacob
No ratings yet
Vayetze - The Ladder of Jacob
10 pages
MM - 9E Planetary Axle Wheel Ends
100% (1)
MM - 9E Planetary Axle Wheel Ends
41 pages
Paper 1-A Deep Neural Network Study
No ratings yet
Paper 1-A Deep Neural Network Study
7 pages
CV Vetnizah Juniantito 2022 PDF
No ratings yet
CV Vetnizah Juniantito 2022 PDF
3 pages
TEWWG:Story of An Hour
No ratings yet
TEWWG:Story of An Hour
2 pages
1 s2.0 S1364815221001778 Main
No ratings yet
1 s2.0 S1364815221001778 Main
18 pages
Expert Systems and Its Applications
No ratings yet
Expert Systems and Its Applications
18 pages
Fpsyt 11 00440
No ratings yet
Fpsyt 11 00440
12 pages
Diagnosis of Autism Spectrum Disorder Based On Fun
No ratings yet
Diagnosis of Autism Spectrum Disorder Based On Fun
21 pages
Deep Belief Network
No ratings yet
Deep Belief Network
15 pages
Comparing Automated and Non Automated Machine Learning For Autism Spectrum Disorders
No ratings yet
Comparing Automated and Non Automated Machine Learning For Autism Spectrum Disorders
11 pages
Asd 6
No ratings yet
Asd 6
10 pages
Improving Diagnosis of Autism Spectrum Disorder and Disentangling Its Heterogeneous Functional Connectivity Patterns Using Capsule Networks
No ratings yet
Improving Diagnosis of Autism Spectrum Disorder and Disentangling Its Heterogeneous Functional Connectivity Patterns Using Capsule Networks
4 pages
Team-14 Major Project Presentation
No ratings yet
Team-14 Major Project Presentation
42 pages
ASD DLrevZ24
No ratings yet
ASD DLrevZ24
33 pages
Overview of West African Islamic Civilizations
No ratings yet
Overview of West African Islamic Civilizations
4 pages
Identifying Autism Spectrum Disorder Based On Individual-Aware Down-Sampling and Multi-Modal Learning
No ratings yet
Identifying Autism Spectrum Disorder Based On Individual-Aware Down-Sampling and Multi-Modal Learning
17 pages
Autism Detection of MRI Brain Images Using Hybrid Deep CNN With DM-Resnet Classifier
No ratings yet
Autism Detection of MRI Brain Images Using Hybrid Deep CNN With DM-Resnet Classifier
11 pages
1.2 Impromptu Speech
No ratings yet
1.2 Impromptu Speech
20 pages
BUKU - MENU Parkir Depan
No ratings yet
BUKU - MENU Parkir Depan
2 pages
Uncertainty Modeling For Multicenter Autism Spectrum Disorder Classification Using TakagiSugenoKang Fuzzy Systems
No ratings yet
Uncertainty Modeling For Multicenter Autism Spectrum Disorder Classification Using TakagiSugenoKang Fuzzy Systems
10 pages
Single Volume Image Generator and Deep Learning-Based ASD Classification
No ratings yet
Single Volume Image Generator and Deep Learning-Based ASD Classification
11 pages
SSRN Id4057055
No ratings yet
SSRN Id4057055
9 pages
Asd 18
No ratings yet
Asd 18
19 pages
Computers 12 00092 v3
No ratings yet
Computers 12 00092 v3
19 pages
HKDSE - English - PP - 20120116-2（拖移項目） 10 booklet
No ratings yet
HKDSE - English - PP - 20120116-2（拖移項目） 10 booklet
2 pages
CNN Model ADHD ASD
No ratings yet
CNN Model ADHD ASD
8 pages
Comparative and Superlative Degree
No ratings yet
Comparative and Superlative Degree
3 pages
Diagnosis of Autism Spectrum Disorder Based On Eigenvalues of Brain Networks
No ratings yet
Diagnosis of Autism Spectrum Disorder Based On Eigenvalues of Brain Networks
13 pages
575 - Sahodaya Post Mid Term Circular 2024
No ratings yet
575 - Sahodaya Post Mid Term Circular 2024
1 page
Predicting Autism Diagnosis Using Image With Fixations and Synthetic Saccade Patterns
No ratings yet
Predicting Autism Diagnosis Using Image With Fixations and Synthetic Saccade Patterns
4 pages
A Deep Convolutional Neural Network Based Detection System For Autism Spectrum Disorder in Facial Images
No ratings yet
A Deep Convolutional Neural Network Based Detection System For Autism Spectrum Disorder in Facial Images
5 pages
Autism Spectrum Disorder
No ratings yet
Autism Spectrum Disorder
10 pages
1 s2.0 S0010482521007435 Main
No ratings yet
1 s2.0 S0010482521007435 Main
25 pages
IEEE Conference Template 2
No ratings yet
IEEE Conference Template 2
5 pages
Machine Learning Classifiers For Autism Spectrum Disorder A Review
No ratings yet
Machine Learning Classifiers For Autism Spectrum Disorder A Review
6 pages
Advanced Analytics of Image Datasets in Human Health
From Everand
Advanced Analytics of Image Datasets in Human Health
Dr. Zemelak Goraga
No ratings yet

Transformer

Uploaded by

Transformer

Uploaded by

Computers in Biology and Medicine 151 (2022) 106320

Contents lists available at ScienceDirect

Computers in Biology and Medicine

Classifying ASD based on time-series fMRI using spatial–temporal

ARTICLE INFO ABSTRACT

1. Introduction impairment of the brain. i.e.,electroencephalogram, near-infrared spec-

Fig. 1. The flowchart of the presented model.

Fig. 3. Details of the GGDB.

Table 3 5. Experiments and results

Computers in Biology and Medicine 151 (2022) 106320

Fig. 4. Performance compared with previous literature on ABIDE I&II.

Fig. 6. Performance comparison to evaluate the effectiveness of layers on ABIDE I&II.

In this work, an effective ST-Transformer deep learning framework

The authors declare that they have no known competing finan-

Our proposed method exhibits certain competitive features. Firstly,

You might also like