0% found this document useful (0 votes)
24 views11 pages

2022 Multimodal Brain Tumor Detection Using Multimodal Deep Transfer Learning

Uploaded by

Ansuman Acharya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views11 pages

2022 Multimodal Brain Tumor Detection Using Multimodal Deep Transfer Learning

Uploaded by

Ansuman Acharya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Applied Soft Computing 129 (2022) 109631

Contents lists available at ScienceDirect

Applied Soft Computing


journal homepage: www.elsevier.com/locate/asoc

Multimodal brain tumor detection using multimodal deep transfer


learning

Parvin Razzaghi a , , Karim Abbasi b,c , Mahmoud Shirazi d , Shima Rashidi e
a
Artificial Intelligence Lab, Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences
(IASBS), Zanjan, Iran
b
Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
c
Laboratory of System Biology, Bioinformatics & Artificial Intelligent in Medicine (LBB&AI), Faculty of Mathematics and Computer Science, Kharazmi
University, Tehran, Iran
d
Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
e
Department of Computer Science, College of Science and Technology, University of Human Development Sulaymaniyah, Kurdistan Region, Iraq

article info a b s t r a c t

Article history: MRI brain image analysis, including brain tumor detection, is a challenging task. MRI images are
Received 20 November 2021 multimodal, and in recent years, multimodal medical image analysis has gotten more attention.
Received in revised form 13 September 2022 Modes refer to data from multiple sources which are semantically correlated and sometimes provide
Accepted 14 September 2022
complementary information to each other. In this paper, modalities in MRI brain images refer to
Available online 20 September 2022
the different planes of view (axial, sagittal, and coronal planes) in which MRI images are taken. In
Keywords: most cases, in the literature on multimodal data analysis, it is assumed that all modalities for all
Multimodal brain images samples are available. While, in medical image analysis, this assumption is not valid, and only some
Domain adaptation modalities might be available for each sample. The knowledge transfer between and within modalities
Deep learning is considered to tackle this challenge in MRI brain image segmentation. For knowledge transfer, domain
Brain image segmentation adaptation is an important step that deals with the problem of having different distributions between
the training and test sets. These challenges have not been considered in recent multimodal brain image
analysis studies. This paper proposed a new multimodal deep transfer learning for MRI brain image
analysis. The main differences of the proposed approach, with respect to the other multimodal brain
image analysis, are 1) proposing a new multimodal feature encoder and 2) proposing a new multimodal
adaptation technique to handle the different distribution between the training and test sets. We applied
it to IBSR and Figshre brain tumor datasets to evaluate the proposed approach. The results confirm
that the proposed approach significantly outperforms the other comparable approaches.
© 2022 Elsevier B.V. All rights reserved.

Code metadata of brain tumor types. In recent years, utilizing more than one
modality (i.e., multimodal) data in biomedical imaging has gotten
Permanent link to reproducible Capsule: https://doi.org/10. more attention [1–3]. The unimodal normality assumptions are
24433/CO.7298109.v1. not correct for this type of data. In this case, the training data has
a multimodal distribution with two or more modes. Hence, the
1. Introduction unimodal machine learning approaches are not effective here.
Modality in machine learning refers to different ways by
This paper investigated the Magnetic Resonance Imaging (MRI) which knowledge is obtained [1]. Modes refer to data from
brain image segmentation problem. It is a difficult task because multiple sources that are semantically correlated and sometimes
it involves dealing with different types of brain regions, brain provide complementary information to each other. It means that
tumors, and their complexity. MRI Brain images provide informa- some reflecting patterns are not visible in individual modalities
tion about the human soft tissue, which is helpful in the diagnosis
on their own. For example, modality could refer to a situation
where imaging includes angiography and computed tomography
The code (and data) in this article has been certified as Reproducible by (CT) scanners, ultrasound, medical radiation, and magnetic reso-
Code Ocean: (https://codeocean.com/). More information on the Reproducibility
nance imaging (MRI); or it could refer to different MRI sequences
Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-
engineering/computer-science/journals. like T1-weighted and T2-weighted scans; or it could refer to
∗ Corresponding author. a different plane of views: axial, sagittal, and coronal planes.
E-mail address: [email protected] (P. Razzaghi). Multimodal machine learning aims to build models that can

https://doi.org/10.1016/j.asoc.2022.109631
1568-4946/© 2022 Elsevier B.V. All rights reserved.
P. Razzaghi, K. Abbasi, M. Shirazi et al. Applied Soft Computing 129 (2022) 109631

process knowledge from multiple modalities. Up to now, most In the proposed approach, it is assumed that all modalities for
approaches in multimodal learning assume that all modalities are each sample are not available. It means that a different set of
available for all data. However, only some modalities for each modalities for each sample is available. The proposed approach
sample are available in many situations. It should be noted that has three steps. We designed a multimodal feature encoder for
this assumption covers situations in which all modalities for all the training set in the first step. The proposed multimodal feature
data are available. Hence, it serves the generality of the proposed encoder network has some shared layers that learn the common
approach in the literature. In this case, current approaches are di- representation between modalities. Also, each modality has some
vided into two categories: (1) the modality knowledge is ignored, specific layers to learn modality-specific representation, which
and a uni-modal model is learned (2) the modality knowledge of are followed by task-specific prediction layers. In the second
each sample is involved in learning a model. step, the feature encoder network for the test domain is learned.
In the following, we investigated the approaches in the sec- To this end, a new multimodal domain adaptation technique is
ond category. In recent years, multimodal deep learning has proposed in which the transfer of knowledge happens between
gotten more attention [2–5]. Guo et al. [6] introduced a deep- the corresponding modalities. The proposed multimodal domain
learning-based medical image segmentation in the presence of adaptation technique is based on the adversarial technique. In
multimodal inputs. To this end, they introduced three levels for the third step, a test sample is fed into the test feature encoder
cross-modality fusion: the feature learning level, the classifier network, and then it is fed into the modality-specific prediction
level, and the decision-making level. At the feature learning level, layers to predict outputs. The final label is obtained using a voting
at first, three modalities of one sample that include Magnetic Res- mechanism.
onance Imaging (MRI), Computed Tomography (CT), and Positron The main differences between the proposed work with respect
Emission Tomography (PET) are concatenated and then fed into the similar work are as follows:
the feature learning network. Feature-level fusion is done in the
same way in other works too [7,8]. At the classifier level, each (a) The proposed work is the first work that considers the
modality is fed into a specific feature extraction network, and distribution differences between modalities in brain tumor
then they are concatenated and fed into the classification layers. detection by adapting modalities. It leads the algorithm to
In works [9–12], multimodal fusion is done at the classifier find a more discriminative feature representation space.
level. At the decision-making level, a single-modality classifier is (b) We propose a new approach for modality adaptation by
learned for each modality. Next, the final decision is made by utilizing the adversarial domain adaptation technique. To
fusing the output from all classifiers (i.e., voting). Guo et al. [6] this end, the base adversarial domain adaptation technique
conclude that, in brain tumor segmentation, cross-modality fu- is modified such that it can be applied to multimodal data.
sion is performed better at the feature level and classifier level
than at the decision-making level. Baltrusaitis et al. [1] introduced This paper is organized as follows: in Section 2, the related
five challenges in multimodal machine learning: representation, work is investigated. The proposed method is given in detail in
translation, alignment, fusion, and co-learning. In the translation Section 3. The experimental results are explained in Section 4.
step, transferring knowledge from one modality to a different Finally, the conclusion is given in Section 4.
modality is investigated, and in the co-learning step, transferring
knowledge between modalities is investigated. Domain adapta- 2. Related work
tion is one of the main steps in knowledge transfer between or
within modalities. By investigating recent works, it is found that This section investigates the state-of-the-art approaches in
the domain adaptation technique in brain image segmentation MRI brain image detection and multimodal learning. One of the
and brain tumor detection has not gotten enough attention. In main contributions of this approach is modality adaptation in
this paper, this issue is investigated. multimodal data; hence we have reviewed the multimodal learn-
In classical machine learning, it is assumed that the training ing approaches.
and test domains have the same distribution. Hence, the learned
model on the training data could successfully be applied to the MRI brain image detection. Zhang et al. [25] introduced an MRI
test domain. However, this assumption is questionable in many brain image classification approach. First, they extract features
situations, and we probably have a different distribution in the from the images by utilizing the wavelet transform. Next, prin-
training and test domains. Domain adaption considers this issue cipal component analysis (PCA) is used to reduce the feature
by learning a feature representation for the test domain such that dimension. Then, these reduced features are fed into a neural
the training and test domains have the same distribution [13]. network. Abiwinanda et al. [26], instead of utilizing pre-trained
In recent years, domain adaptation technique has gotten much CNN networks, trained the five introduced convolutional neural
more attention in object detection [14–16], natural language networks to classify brain tumor types from MRI images. Kutlu
processing [17–19], bioinformatics [20–22], and computer vi- and Avc 1 [27] presented an approach to classify tumors in
sion [23]. Since MRI brain images are taken from patients with medical images to improve the diagnosis of the diseases. They
different conditions, it differences in the training and the test utilized the pre-trained AlexNet CNN architecture [28] to extract
distributions. Deep learning in many research areas improves features. Next, the obtained feature vector is fed into a 1D discrete
performance. However, in these techniques, it is assumed that wavelet transform to reduce the feature vector dimensionality.
the deep model learns and infers on the same distribution space. Finally, LSTM is used to learn the class label of the images.
To consider this issue, deep transfer learning is emerged as an Cheng et al. [29] introduced a novel approach to classify-
efficient tool and is used in many research areas like computer ing brain tumors from T1-weighted contrast-enhanced MRI im-
vision [23], NLP, and drug discovery [24]. However, it gets low ages. They used the augmented tumor region to learn the model.
attention in medical image analysis, especially MRI brain tumor In the augmented tumor region, the knowledge of the tumor-
detection. Given that MRI brain images get in the multi views surrounding tissues of the tumor regions is utilized. Then, the
and each view’s representation space is different, it is effective augmented tumor region is split into the ring-form subregions
to utilize deep transfer learning to analyze these images better. at increasingly fine resolutions. Each region is described by three
This is the first study to assess the multimodal domain adaptation sets of feature extractors, including an intensity histogram, a
techniques in MRI brain image segmentation to the best of our gray level co-occurrence matrix (GLCM) [30], and a bag-of-words
knowledge. (BoW) model. Finally, spatial pyramid matching (SPM) [31] is
2
P. Razzaghi, K. Abbasi, M. Shirazi et al. Applied Soft Computing 129 (2022) 109631

used to generate the final representation of the region. Sajjad attention mechanism in squeeze and excitation blocks of the
et al. [32] introduced deep learning-based multi-grade brain tu- MSENet to focus more on tumor regions in detection.
mor classification. Their approach has three steps. In the first One of the challenges which do not get attention in the recent
step, the input images are segmented into tumor regions by works is that MRI brain image datasets contain images from
using InputCascadeCNN [33]. Then, eight different augmentation multiple modalities, and the representation space of these modal-
techniques are applied to the extracted brain tumor region. In the ities is different. For example, some images might be taken from
third step, it is fed into the CNN networks to extract features the axial plane and others from the sagittal plane. One of the
and classify the input region. To this end, they have used the main contributions of this paper is that a modality adaptation
pre-trained VGG-19 architecture [34] and fine-tuned it by using technique is proposed assuming that inputs could be assigned to
augmented data. different modalities.
Afshar et al. [35] introduced an approach in which capsule
Multimodal Learning. In recent years, multimodal learning has
networks [36] are used to classify brain tumors from MRI images.
attracted much attention. Ngiam et al. [49] divide the multimodal
Deepak and Ameer [37] presented a method for brain tumor
machine learning methods into three categories: (1) Multimodal
classification in which the pre-trained GoogleNet [38] is used
fusion: In this category, the most important assumption is that all
to extract features from MRI images. Isunur and Kakarla [39]
modalities for both the training and test samples are provided,
introduced a brain tumor segmentation method that utilizes the
and the aim is to fuse the knowledge of different modalities to
modified U-net model to segment the brain image into semantic
have a stronger representation of each sample. The fusion is done
regions. Also, they have used adaptive thresholding as a post-
in two ways: (a) fusion in the feature extraction steps to feed
processing algorithm to have improved segmentation results. Sa
into the subsequent layers to predict the output or (b) fusion in
et al. [40] introduced a fully automatic tumor detection and
the prediction step to produce the final output [50]. (2) cross-
segmentation method utilizing active contour at multiple steps.
To this end, the symmetry characteristics of the human brain modality learning: In this category, all modalities of data in the
anatomy are used. Rehman et al. [41] modify the SegNet to training step are provided while only one modality of the data
reduce the getting loss the location information along with spatial in the test phase is available. The main goal is to learn a new
details in deeper layers of brain tumor segmentation which is discriminative representation for one modality by utilizing the
called BrainSeg-Net. They propose a feature enhancer (FE) block unlabeled data of the other modalities. (3) Shared representation
to extract the middle-level features from low-level ones from learning: in the third category, the available modalities of the
shallow layers and share them with the dense layers. Then, they training and test samples are different. The main goal is to learn
incorporate the FE block into the BrainSeg-Net architecture to a new invariant representation across different modalities. The
achieve better performance in tumor identification. goal is to transfer knowledge between different modalities with
Sharif et al. [42] introduced a model in which the pre-trained different distributions in all categories. The proposed approach
Densenet201 model is fine-tuned. Then, the output of the average lies in the third category.
pool layer, as a feature, is fed into feature selection methods. They In [51], a human pose recovery is proposed using a multimodal
have utilized two feature selection methods: Entropy–Kurtosis- deep autoencoder. A multi-layered deep neural network is used
based High Feature Values (EKbHFV) and a modified genetic as a feature extractor, and hypergraph Laplacian with low-rank
algorithm (MGA) based on metaheuristics. The selected features representation is used to fuse the multimodal knowledge. Eitel
are fused and fed into a multiclass SVM cubic classifier to have the et al. [52] introduced an RGB-D object recognition model us-
final label. Kong et al. [43] introduced an approach that utilizes ing multimodal deep learning. They utilize two separate CNN
the 3D volume of MRI brain images to segment brain tissues. To networks as a feature extractor for each modality, and then
this end, they have introduced an effective method to generate these representations are fed into a fusion network to combine
supervoxels for the 3D MRI image. Then, the supervoxels are fed the knowledge. In their work, the probably different distribu-
into a graph filter as a classifier to detect different types of tissues. tions between modalities are not considered. In [53], a deep
Sadad et al. [44] introduced a Unet architecture for brain learning-based framework is presented to integrate multimodal
tumor detection where ResNet50 is used as a backbone. Then, longitudinal data called MildInt. They are simply concatenated
they have utilized NASNet [45] to find the best architecture. They with each modality feature representation and fed into the clas-
evaluate the approach on the Figshare dataset. To this end, they sification layers. Akbari et al. [54] presented an approach for
have split the dataset randomly into a training set (80%) and a image-phrase grounding using a multi-level multimodal common
test set (20%). They utilized the pre-trained well-known architec- semantic space. They have utilized the deep feature maps on
tures and referred to it as transfer learning. While, in the pro- multiple levels (i.e., the output of different layers) and then map
posed approach, domain adaptation is used. Bodapati et al. [46] each of them into a common space. Next, the correspondences
a two-channel deep neural network architecture in which the between spatial regions in the image at different levels and words
MRI image is fed into InceptionResNetV2 and Xception networks, in the sentence at different positions are computed using the
then their outputs are combined using their introduced pooling attention mechanism. Cheng et al. [55] proposed a novel frame-
technique. work for domain adaptation to effectively adapt a segmentation
Moreover, utilizing the attention mechanism helps to focus on network to an unlabeled target domain.
the knowledge of tumor regions. Zhu et al. [47] introduced a deep In [14], an adversarial domain adaptation technique is intro-
learning model for MRI segmentation of normal brain tissues. duced. However, this technique is only for unimodal data. In the
They modified U-net in two ways called Binary Channel Attention proposed approach, we modify this technique such that it could
U-Net (BCAU-Net). First, they introduced a novel Binary Channel be applied to multimodal data. To do so, we offer an architecture
Attention Module (BCAM) into skip connection of U-Net, second, of how adaptation could be made between modalities, and also,
they aggregate multiscale spatial information using spatial pyra- the proper loss functions are introduced too.
mid pooling (SPP) modules instead of original average-pooling
and max-pooling operations. Bodapati et al. [48] introduced Mul- 3. Proposed approach
timodal Squeeze and Excitation model (MSENet) for brain tumor
severity classification. They take the learned feature descriptor In this section, the proposed approach is demonstrated in
of the multiple pre-trained models as input. They have utilized detail. The proposed approach has two stages. In the first stage,
3
P. Razzaghi, K. Abbasi, M. Shirazi et al. Applied Soft Computing 129 (2022) 109631

Fig. 1. The detailed architecture of different sub-networks of the proposed approach for image classification.

a feature extractor is designed for the multimodal training do- 3.2. Multimodal feature encoder network
main. In the second stage, a new multimodal domain adaptation
technique is proposed for learning a reliable feature encoder In this section, the proposed multimodal feature encoder ar-
network for the multimodal test data. In the following, at first, chitecture is explained. It should be noted that each sample might
the problem formulation is given, and then each stage of the belong to m different modalities. In Fig. 3, the overall schematic of
proposed approach is taken into account. the proposed architecture for three different modalities is shown.
As it is shown in the proposed architecture, the hidden layers
3.1. Problem formulation are shared between all modalities; meanwhile, some specific
layers are considered for each modality. The shared layers do the
In this section, the problem formulation of the proposed ap- knowledge transfer between modalities in the proposed architec-
proach is given. Let {(Isi , lsi )}i=s 1 denote the multimodal training
n
ture. The intuition behind the proposed architecture is that there
(i.e., source) dataset where Isi and lsi respectively represent the
exists a shared feature subspace between modalities, and also, for
multimodal training samples and labels. The number of train-
each modality, there is a specific feature subspace. To design the
ing samples is shown by ns . It is assumed that the number of
architecture of the feature encoder, depending on whether the
modalities observed in the training samples is demonstrated by
task is classification or segmentation, the pre-trained ResNet50
m. For example, in MRI brain image analysis, sample Isi could be
one image or multiple images (i.e., MRI images from multiple or U-net are respectively used. Figs. 1 and 2 show the detailed
views). Each training sample could belong to one or more of architecture of the feature encoder and the modality task-specific
these m modalities. Also, the multimodal test (i.e., target) sam- layers. The multimodal loss function for the training encoder is
n
ples are shown by {Iti }i=t 1 where nt denote the number of test defined as follows:
m
[n ]
samples. In this case, too, each test sample might belong to m ∑ ∑s

[δk xi L(yi , ŷi )]


( s) s s
different modalities. It should be noted that, in our setting, it is L=− (1)
assumed that all modalities for a sample might not be available, k=1 i=1
but each sample might belong to a different set of modalities. It where function δk (.) maps to one of the inputs belonging to the
should be noted that this assumption is more general than the
kth modality; otherwise, it maps to zero. Function L(.) represents
case in which all modalities for all samples are available. As a
the categorical cross-entropy loss function (for tasks, classifica-
result of this assumption, the training and test domains contain a
tion). Also, variable ŷ denotes the predicted outputs, which are
different distribution of the MRI brain images. In this paper, it
computed as follows:
is assumed that the training and test samples are drawn from
the training domain (χ s ) and the test domain (χ t ) respectively. ŷsi,k = Ck (Fs,k (Fs (xsi ))) (2)
In classical machine learning methods, it is assumed that the
training (source) and test (target) samples are drawn from the where Fs denotes the shared feature encoder network and Fs,k
same domain distribution (i.e., p(χ s ) ∼ p(χ t )), whereas this denotes the specific feature encoder of the kth modality. Also, Ck
assumption is not applied in many cases. This paper investigates denotes the prediction layers of the kth modality.
these challenges by developing a new approach to multimodal In medical image segmentation, the class imbalance pose a
domain adaptation techniques. The main goal is to find a model significant challenge since the target lesions often occupy a con-
which gets an image and maps it into the label space, assuming siderably smaller volume relative to the background. To this end,
that multiple inputs might belong to different modalities. for the image segmentation loss function, we use the Unified
4
P. Razzaghi, K. Abbasi, M. Shirazi et al. Applied Soft Computing 129 (2022) 109631

Fig. 2. The detailed architecture of different sub-networks of the proposed approach for image segmentation.

Focal loss (LUF ) function, which is introduced in [56]. It leads to


reducing the lossy characteristics of the medical image segmenta-
tion. This loss function generalizes Dice and cross entropy-based
losses, which are defined as follows:
LUF = λLaF + (1 − λ)LaFT (3)
where λ controls the relative importance of the two-component
of loss and LaF is an asymmetry focal loss which is a modification
of cross-entropy loss that gives more importance to rare classes
(r) [57]:
1 1 ∑
ysi,k : r log pt,r − (1 − ŷsi,k : c )γ log(ŷsi,k : r )
( )
LaF = − (4)
N N
c̸ =r

where ysi,k : r denotes rth index of one-hot encoding scheme of


ground truth labels of the ith sample, pt,c is a vector of predicted
values for each class, and where indices c and i iterate over all
classes and pixels, respectively. Also, LaFT denotes the asymmetric Fig. 3. The proposed architecture for multimodal feature encoder (for three
Focal Tversky loss, which is defined as follows: different modalities).
∑ ∑
LaFT = (1 − mTI) + (1 − mTI)1−γ (5)
c̸ =r c=r
is to find a new space in which the training and test domains
where mTI is designed, it is related to Dice, considering that it is have the same distribution. In recent years, domain adaptation
suitable for imbalanced output. mTI is defined as follows: has gotten more attention. To adapt the knowledge between
modalities (in most cases, the number of modalities is greater
∑N
i=1 p0i goi
mTI = ∑N ∑N ∑N (6) than two), there are different ways: (1) in a first way, we can uti-
i=1 p0i goi + δ i=1 p0i g1i + (1 − δ ) i=1 p1i goi lize the existing knowledge adaptation technique and adapt each
where p0i is the probability of belonging ith pixel to the fore- paired modalities, (2) in a second way, the existing knowledge
ground class and p1i is the probability of belonging ith pixel to the adaptation can be utilized to adapt each modality with the other
background class. Also, goi (g1i ) takes value 1 (0) for foreground modalities, (3) We can avoid these difficulties by proposing a new
and 0 (1) for background. The parameter δ controls the ratio of technique in which knowledge adaptation between modalities
false-negative and false-positive rates. It should be noted that the is done in a single step. In this paper, the third way is chosen.
hyperparameters λ, γ , and δ are optimized during the training To this end, the proposed technique is based on the adversarial
step. domain adaptation technique [13] which its overall schematic is
shown in Fig. 4. The adversarial domain adaptation aligns the
3.3. The proposed multimodal domain adaptation feature distributions across the training and test domains in a
two-player minimax game. It consists of two-loss functions: the
This section describes the proposed multimodal domain adap- discriminator loss and the mapping loss function. These two loss
tation technique in detail. In domain adaptation, the main goal functions are optimized iteratively until convergence is achieved.
5
P. Razzaghi, K. Abbasi, M. Shirazi et al. Applied Soft Computing 129 (2022) 109631

Fig. 4. The overall schematic of the proposed multimodal domain adaptation technique.

The classic adversarial domain adaptation approach consists of The parameter σ in Eq. (8) denotes the variance, which measures
a discriminator network that discriminates the sample from two the deviation of the distance function d(.,.) from its average. The
different domains. In the proposed approach, to design a discrim- multimodal adversarial mapping loss of the proposed domain
inator network, there are two different viewpoints: in the first adaptation technique is defined as follows:
viewpoint, we can extend discriminator output to k dimensions m [
in which it represents the joint distribution over modalities. In
∑ ) ]
−Ext ∼P(xt ) [log(Dk Ft,k (xtk ) )]
(
Ladv,m = (9)
the second viewpoint, we design k different discriminator net- k k
k=1
works, each of which discriminates the corresponding modalities
between different domains. Specifically, it is assumed that there exists a feature space
The intuition behind the proposed multimodal domain adap- that is shared by both the training and the test domains, and it
tation technique is that the knowledge transfer should happen is discriminative enough for predicting the output. By learning
(1) between the corresponding modalities and (2) between the a feature extractor F that can map both the training and test
shared spaces of all modalities. This intuition leads us to develop inputs to the same feature space, the classifier learned on the
a new architecture for domain adaptation in the presence of training data can be utilized to classify samples drawn from the
multimodal data. To this end, a specific discriminator network test domain.
for each modality is designed to distinguish the training and test
samples belonging to the corresponding modality. In this case, the 3.4. Inference
discriminator loss function is defined as follows:
m
∑ ∑ In this section, the inference step of the proposed approach is
log Dk Fs,k (xsk ) W(xsk ) explained. In this case, a test sample is fed into the test feature
[ ( )]
Ladv, D = −
(7) encoder network. It should be noted that the test feature encoder
k=1 xs ∼P xs
k k ( )
has k branches (correspond to k modalities). Next, the obtained
Ft,k (xtk )
[ ( )]
− Ext ∼P(xt ) log(1 − Dk ) feature vectors for each modality are fed into the corresponding
k k
prediction layers to obtain the corresponding labels for each
where Dk is the modality domain discriminator network of the
modality. If the modality of the test sample is known, the pre-
kth modality. The operator ∼ in x ∼ p(x) means that the random
dicted label of the corresponding modality branch is considered
variable x belongs to a probability distribution p(x). The probabil-
as output. However, if it is unknown, a voting mechanism is
ity distribution of kth modality for the training samples and the
test samples are respectively shown by P(xsk ) and P(xtk ). It assigns utilized to assign a label to the test sample.
the label ‘‘one’’ to the sample drawn from the kth modality of
the training domain and assigns ‘‘zero’’ to the sample drawn 4. Experimental results
from the kth modality of the test domain. To sum it up, the loss
function in Eq. (7) determines whether a data point is drawn In this section, some experiments are designed to evaluate
from the training domain or the test domain. Also, in Eq. (7), we the proposed approach. In this paper, two datasets are utilized:
assign a weight to each source sample to prevent the negative brain Figshare and IBSR datasets. In the following, each of these
transfer. It should be noted that Eq. (7) sums up the modality- datasets is explained, and the obtained results are discussed in
specific discriminator. To this end, the weight function is defined detail. To show the effectiveness of each module of the proposed
as follows: approach, an ablation study is done. To do so, three base meth-
ods are chosen to evaluate the power of the proposed modality
d(xsk , xtk )
∑ ( )
W(xsk ) = exp − (8) adaptation technique, including Base1, Base2, and our approach
2σ (without multimodal DA). Base1 only uses the backbone struc-
( )
xtk ∼P xtk
ture of the proposed approach without utilizing modality knowl-
where d(.,.) denotes the distance ) function,
∑n which in this )case is
2
edge and adaptation technique. Hence, in Base1, similar to the
the Euclidean distance (d xsk , xtk = ( i=F 1 xsk (i) − xtk (i) )1/2 ).
( (
proposed approach, ResNet50 and U-net are respectively used
6
P. Razzaghi, K. Abbasi, M. Shirazi et al. Applied Soft Computing 129 (2022) 109631

Fig. 5. Some samples of Figshare brain dataset (a) Brain meningioma tumor in three different planes (b) Different brain tumor types in the axial plane.

for constructing the basic backbone framework for classification meningiomas (with 708 images), gliomas (with 1426 images),
and segmentation. In Base2, data is fed into the network with- and pituitary tumors (with 930 images). These images are taken
out considering the modality knowledge. The adversarial domain in three planes: sagittal (1025 images), axial (994 images), and
adaptation technique is used to adapt the knowledge between the coronal (1045 images) plane. This dataset is publicly available.1 In
training and test sets. In our approach (without multimodal DA), the proposed approach, three anatomical planes of the brain are
the architecture is completely similar to the proposed approach considered as three modalities. Some samples of this dataset are
with the difference that it does utilize the modality adapta- shown in Fig. 5. As it is shown, the appearance of the MRI brain
tion technique. It should be noted that, in Base1 and Base2, the image and tumor in various modalities are different. In Fig. 5-a,
modality knowledge is waived. three different planes of the brain meningioma tumor are shown.
In this paper, we run the program on the computer of Intel(R) As it is clear, the appearance of the meningioma tumor is different
Core(TM) i7-7700HQ CPU, NVIDIA Quadro M6000 24 GB, and in the different planes. As a result, involving modality knowledge
64G DDR4 RAM. Our method is implemented using python 3.6, might make learning a reliable model easier than the one in
TensorFlow [58], and Keras [59]. Each iteration takes around which the modality knowledge is not considered. This leads us
0.009 s during training with a minibatch size of 256, and Adam to propose a multimodal approach for brain image detection.
is used as the optimization algorithm. The reported time per step In this experiment, two tasks on the Figshare dataset are con-
(iteration) is computed using Keras with TensorFlow backend for sidered: the classification and segmentation tasks. In the former
measuring the time taken in training the models. one, the goal is to assign a label to each image. While in the latter
Also, the hyper-parameters optimization is done for the vari- one, the goal is to detect the border of the tumor in each image
ance of distance function in weight of each sample (Eq. (8)) search and determine the class label of the tumor.
over [0.01, 0.05, 0.1, 0.2, 0.5], the hyper parameter γ over [0.05, In this paper, two approaches are used for splitting the
0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5], the hyper parameter datasets: the patient-level and the image-level. The 233 patients
λ over [0:0.1:1], the hyper parameter δ over [0.1:0.1:0.9], the are randomly split into five subsets of equal size in patient-
learning rate for training feature encoder network search over level splitting. This split guarantees that slices from the same
[0.1,0.01,0.001,0.0001], the learning rate for test feature encoder patient have not appeared simultaneously in the training and test
network search over [0.001,0.0005,0.0001,0.00005,0.00001] and
sets. This setting is similar to the experiments done in [29,60].
the learning rate for discriminator layers search over
The images are randomly split into the training and test sets in
[0.001,0.0005,0.0001,0.00005,0.00001].
image-level splitting. In this case, some approaches use the k-
fold cross-validation [61], while some other approaches randomly
4.1. Brain figshare dataset split the images into the training and the test sets. To have
a fair comparison, we have used k-fold cross-validation as an
Brain Figshare contains brain T1-weighted CE-MRI images. evaluation technique. In this case, in the image-level division
This dataset is acquired from 233 patients from Nanfang Hos- technique, in each fold, 2145 612 images are used for validation.
pital, Guangzhou, China, and General Hospital, Tianjing Medical
University, China, for five years, from 2005 to 2010. This dataset
contains 3064 images of three types of brain tumors that are 1 Can be downloaded from http://dx.doi.org/10.6084/m9.figshare.1512427.

7
P. Razzaghi, K. Abbasi, M. Shirazi et al. Applied Soft Computing 129 (2022) 109631

Table 1
The obtained results of the proposed approach and the base methods on the Figshare dataset. The two-division techniques are used
to evaluate the approach.
Method Accuracy Division technique
Base 1 92.34 ± 2.58%
Base 2 96.12 ± 197% Patinet-level
Our approach (without multimodal DA) 94.09 ± 2.31%
Our approach 97.35% ± 1.80%
Base 1 90.42% ± 2.58%
Base 2 93.74% ± 2.58% Image-level
Our approach (without multimodal DA) 91.89% ± 2.58%
Our approach 96.21% ± 0.87%

Table 2
Classification accuracies (%) of the different methods on the Figshare dataset.
Method Accuracy Division technique
[29] 91.14% k-fold cross-validation
[65] 94.23% Holdout technique (70–30)
[65] 92.61% k-fold cross validation
[63] 93.68% –
[35] 86.56% –
[64] 95.98% k-fold cross validation
[62] - NS-EMFSE + CNN + SVM 95.62% 5-fold cross validation
[46] 95.23% k-fold cross validation
[48]- MSENet 96.05% k-fold cross validation
Our approach (image-level) 96.21 ± 0.87% k-fold cross validation (image-level)
Our approach (patient-level) 97.35 ± 1.80% k-fold cross validation (patient-level)

4.1.1. Classification task Table 3


The obtained results of the base methods and the proposed The reported DICE coefficient of the different methods on the Figshare dataset.

approach are shown in Table 1. Accuracy is used to evaluate the Method Meningioma Glioma Pituitary

proposed approach. As it is shown in Table 1, the two-division [39] 0.8243 0.6077 0.7847
[40] 0.8997 0.6554 0.8395
techniques are used to evaluate the proposed approach. If the
Base 1 0.8867 0.6133 0.8235
patient level is used as the division technique, the proposed ap- Base 2 0.8756 0.6502 0.8099
proach gets a 1.14% improvement over the obtained performance Our approach (without multimodal DA) 0.8997 0.6802 0.8452
in the image-level division technique. By comparing the results Our approach 0.9459 0.7241 0.9108
of Base 1 to Base 2, the proposed approach without domain
adaptation, and the proposed approach, it is concluded that ap-
plying the proposed domain adaptation technique on unimodal 4.1.2. Segmentation task
and multimodal networks improves the performance over models
The segmentation task aims to determine all pixels that belong
that have not been used domain adaptation technique. Also, com-
to the tumor region in the MRI brain image. In Fig. 6, some exam-
paring the two base methods with the proposed approach states
ples of the MRI brain image and the corresponding tumor mask
that utilizing the modality knowledge improves the performance
over models that do not utilize the modality knowledge. are given. In this experiment, DICE is chosen as the evaluation
Also, the proposed approach is compared with recent works in metric, which is defined as follows:
MRI brain image classification in Table 2. As it is shown, the pro- 2 × TP
posed approach performs better than all comparable approaches. DICE = (10)
2 × TP + FP + FN
As it is shown, some approaches use the holdout technique rather
than the k-fold cross-validation. The proposed approach gets a where TP denotes the pixels truly detected as a tumor, FP denotes
1.73% improvement over the best comparable approach. the pixels that are wrongly detected as a tumor, and FN denotes
It should be noted that the comparing methods [35,62–64] the pixels that are wrongly detected as background. DICE coef-
are used the deep convolutional network to extract features. ficient ranges from zero to one, where one denotes the largest
In [35], the CapsNet is selected to extract features since it is similarity between the predicted mask and the ground truth.
robust to rotation and affine transformation. The authors in [64] The obtained results of the proposed approach and the base
use the proposed MidResBlock encoder layer in the proposed methods are given in Table 3. Also, the obtained results are
architecture. None of the above approaches do utilize the modal- compared with the other methods in the literature. The proposed
ity knowledge of the samples. While the proposed approach 1approach respectively gets 0.0288, 0.0363, and 0.0347 improve-
considers this challenge that each sample might have different ments in meningioma, glioma, and pituitary tumor classes over
modalities with respect to the other samples. Also, to show the
our approach without considering the multimodal domain adap-
ability of the proposed modality adaptation technique, we de-
tation. It confirms that the proposed multimodal domain adapta-
sign Base2 which utilized the adversarial adaptation technique.
tion technique effectively learns a reliable feature encoder for the
MSENet, Multimodal Squeeze and Excitation model, [48], is a
multimodal approach that gets the multiple representations of a test data. Also, the proposed approach respectively gets 0.0313,
given tumor image and predicts the severity level of the tumor. 0.0456, and 0.0699 improvements in meningioma, glioma, and
Our approach gets a 1.3% improvement over MSENet. As is shown, pituitary tumor classes over the Base 2 approach. This shows
the comparison in Tables 1 and 2 showed that the proposed that the multimodal feature encoder and multimodal domain
approach successfully considers the multimodality knowledge by adaptation technique are successful in knowledge learning and
the proposed modality adaptation technique. transferring.
8
P. Razzaghi, K. Abbasi, M. Shirazi et al. Applied Soft Computing 129 (2022) 109631

Fig. 6. Some examples of the brain image and the corresponding tumor region.

4.2. IBSR dataset including FAST, SPM5, SPM8, GAMIXTURE, ANN, FCM, KNN, SV-
PASEG, FANTASM, and PVC on the IBSR dataset to investigate the
The IBSR dataset contains 18 T1-weighted MRI images which effect of SCSF ground-truth voxels on performance. A comparison
are taken from 4 healthy females and 14 healthy males with ages of the best-performing methods on WM, GM, and CSF is pre-
ranging from 7 to 71 years. To have a fair comparison, datasets sented in Table 5. As it is shown, the proposed approach performs
are divided into the training and test sets like the state-of-the- better than the other comparable approaches.
art approaches. In this case, the first 12 subjects are used as the
training set, and the remaining six subjects are considered as the 5. Conclusion
test set. In the training set, nine subjects are used for training,
and the remaining three subjects are used for validation. The In this paper, we addressed multimodal brain image analysis.
MRIs are preprocessed by utilizing skull-stripping, normalization, In most cases, the medial images are multimodal, which refers
and bias field correction. In this dataset, for each MRI, ground to different ways by which knowledge is obtained. The literature
truth is provided in which each pixel is labeled with tissue types, assumes that all modalities for each data are available, while
including background, CSF, GM, and WM. In IBSR, ground truth this assumption is often not justified. Hence, in this paper, it is
labels are provided by experts. IBSR volumes are provided in three assumed that some modalities for each data are available. Also,
different planes: axial, sagittal, and coronal. each brain region (that might include a brain tumor) appears
To have a fair comparison, datasets are divided into the train- differently in each modality. These challenges lead us to utilize
ing and test sets like the state-of-the-art approaches. In this knowledge transfer between and within modalities. To this end,
case, the first 12 subjects are used as the training set, and the two main contributions are given. First, a new multimodal fea-
remaining 6 subjects are considered as the test set. The obtained ture encoder network is proposed in which knowledge transfer
results are given in Table 4, in which the results for SegNet, U-Net, between modalities is considered. Then, a multimodal domain
and [66] are only reported from [66]. In all cases, the proposed adaptation technique is proposed to learn a feature encoder for
approach does better than all comparable approaches. the test domain that considers the knowledge transfer between
To statistically evaluate the significant improvement of our and within modalities. We applied it to two datasets to evaluate
method over the comparing methods, the paired t-test is utilized the proposed approach: the brain Figshare dataset and the IBSR
on the class DICE coefficient at the significance level of 0.05. In dataset. In the Figshare dataset, the brain tumor classification
this test, the null hypothesis states that two algorithms have and detection are investigated, and in the IBSR dataset, the brain
the same performance, and the alternative hypothesis is that image segmentation is examined. By comparing the obtained
the two algorithms have different performances. In Table 4, the results with the state-of-the-art approaches, we concluded that
paired t-test results between our approach and other compared it performs better than the comparable approaches.
approaches on the IBSR dataset are reported. For each comparable The main advantage of the proposed approach is that it con-
approach, the obtained p-value is below 0.001, which shows that siders the real situation in which the MRI brain image datasets
the null hypothesis is rejected. There is a significant difference contain images from multiple modalities. Also, it takes to con-
between the DICE coefficient of the proposed approach and other sider the fact that the representation space of each modality
comparable approaches. is different. The other advantage of the proposed approach is
Table 5 compares the proposed approach with different meth- that it is the first time in this research area that the modal-
ods on the IBSR dataset. Valverde et al. [67] compare ten methods, ity adaptation technique is utilized. The shared knowledge is
9
P. Razzaghi, K. Abbasi, M. Shirazi et al. Applied Soft Computing 129 (2022) 109631

Table 4
The reported DICE coefficient of the different methods on the IBSR dataset.
Methods Axial Sagittal Coronal Paired t-test (p-value)
WM GM CSF WM GM CSF WM GM CSF
SegNet 0.72 0.75 0.68 0.71 0.74 0.65 0.70 0.73 0.66 7.92 × 10−11
U-Net 0.89 0.91 0.84 0.86 0.89 0.80 0.88 0.90 0.83 2.77 × 10−4
[66] 0.91 0.93 0.85 0.89 0.91 0.81 0.90 0.92 0.83 0.45 × 10−2
Base 1 0.78 0.79 0.81 0.83 0.76 0.77 0.81 0.79 0.78 3.41 × 10−9
Base 2 0.76 0.79 0.81 0.81 0.79 0.78 0.82 0.79 0.84 1.02 × 10−8
Our approach (without multimodal DA) 0.81 0.86 0.82 0.79 0.91 0.80 0.91 0.87 0.82 7.32 × 10−5
Our approach 0.97 0.98 0.92 0.93 0.96 0.89 0.94 0.98 0.91 –

Declaration of competing interest


Table 5
The reported DICE coefficient of the different methods on the IBSR dataset. The authors declare that they have no known competing finan-
Method WM GM CSF cial interests or personal relationships that could have appeared
State-of-the-art [67] 0.89 ± 0.02 0.91 ± 0.01 0.79 ± 0.08 to influence the work reported in this paper.
BCAU-Net [47] 0.90 0.91 0.85
[43] 0.85 ± 0.01 0.87 ± 0.03 0.57 ± 0.08
Data availability
[68] 0.90 ± 0.01 0.91 ± 0.03 0.79 ± 0.03
[66] 0.88 ± 0.03 0.89 ± 0.05 0.87 ± 0.03
[69] 0.87 ± 0.03 0.92 ± 0.02 0.57 ± 0.19 Code and data are uploaded in CodeOcean. The uploaded cap-
[70] 0.87 ± 0.02 0.90 ± 0.01 0.57 ± 0.13 sule in CodeOcean is approved.
[71] 0.89 ± 0.02 0.86 ± 0.01 0.67 ± 0.03
Our approach 0.94 ± 0.02 0.94 ± 0.04 0.93 ± 0.02
Acknowledgment

Parvin Razzaghi and Karim Abbasi would like to appreciate the


preserved in the proposed adaptation technique, and the com- Iranian National Science Founding (INSF), Iran under Grant No.
plementary knowledge is encoded. The proposed approach could 97026018.
be applied to different medical image segmentation approaches
like mammogram images detecting breast cancer tumor tissue. References
Each mammogram image has four different views include: right
[1] T. Baltrušaitis, C. Ahuja, L.P. Morency, Multimodal machine learning: A
craniocaudal (RCC) view, right mediolateral oblique (RMLO) view, survey and taxonomy, IEEE Trans. Pattern Anal. Mach. Intell. 41 (2) (2018)
left craniocaudal (LCC) view, and left mediolateral oblique (LMLO) 423–443.
view. However, in many cases, all four views are not provided for [2] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, A.Y. Ng, Multimodal deep
learning, in: ICML, 2011.
each patient. Hence, the proposed approach could be applied to [3] J. Gao, P. Li, Z. Chen, J. Zhang, A survey on deep learning for multimodal
this application. data fusion, Neural Comput. 32 (5) (2020) 829–864.
As mentioned, many advanced techniques for multimodal [4] T. Kim, B. Kang, M. Rho, S. Sezer, E.G. Im, A multimodal deep learning
method for android malware detection using various features, IEEE Trans.
brain tumor detection are proposed. In these techniques, the dis-
Inf. Forensics Secur. 14 (3) (2018) 773–788.
tribution difference between different modes is not considered. [5] J. Venugopalan, L. Tong, H.R. Hassanzadeh, M.D. Wang, Multimodal deep
The proposed approach considers the adaptation technique be- learning models for early detection of Alzheimer’s disease stage, Sci. Rep.
tween modalities. It is the main reason which helps the proposed 11 (1) (2021) 1–13.
[6] Z. Guo, X. Li, H. Huang, N. Guo, Q. Li, Deep learning-based image
approach to get the performance improvement. The ablation segmentation on multimodal medical imaging, IEEE Trans. Radiat. Plasma
study (comparison between the proposed approach and proposed Med. Sci. 3 (2) (2019) 162–169.
approach without domain adaptation) done in Tables 1 and 3, [7] M. Lorenzi, et al., Multimodal image analysis in alzheimer’s disease via
statistical modelling of non-local intensity correlations, Sci. Rep. 6 (22161)
confirms this statement.
(2016).
One of the disadvantages of the proposed approach is that [8] X. Xu, D. Shan, G. Wang, X. Jiang, Multimodal medical image fusion using
if the number of modalities increases, the number of discrim- PCNN optimized by the QPSO algorithm, Appl. Soft Comput. 46 (2016)
inator sub-networks also increases. It leads to more learnable 588–595.
[9] G. Bhatnagar, Q.M.J. Wu, Z. Liu, Directive contrast based multimodal
parameters and more computational complexity. Moreover, if the medical image fusion in NSCT domain, IEEE Trans. Multimedia 15 (5)
number of samples in one of the modalities is low, then the (2013) 1014–1024.
corresponding discriminator sub-network does not learn reliably. [10] R. Singh, A. Khare, Fusion of multimodal medical images using daubechies
One of the limitations of the proposed approach is the unbalanced complex wavelet transform - A multiresolution approach, Inf. Fusion 19
(2014) 49–60.
distribution of data in different modalities. It causes the modal- [11] X. Zhu, H.I. Suk, S.W. Lee, D. Shen, Subspace regularized sparse multi-
ity’s corresponding feature extractor could not be appropriately task learning for multi-class neurodegenerative disease identification, IEEE
learned, and consequently, modality adaptation is not performed Trans. Bio-Med. Eng. 63 (3) (2016) 607–618.
[12] P. Razzaghi, K. Abbasi, M. Shirazi, N. Shabani, Modality adaptation in
correctly. In future work, this challenge could be considered. The
multimodal data, Expert Syst. Appl. 179 (2021) 115126.
other disadvantage of the proposed approach is that the number [13] E. Tzeng, J. Hoffman, K. Saenko, T. Darrell, Adversarial discriminative
of hyperparameters that should be optimized is large, leading to domain adaptation, in: IEEE Conference on Computer Vision and Pattern
a large amount of computation time due to the combinatorial Recognition, 2017, pp. 7167–7176.
[14] H.K. Hsu, et al., Progressive domain adaptation for object detection, in:
explosion of hyperparameters. Also, the proposed approach does IEEE/CVF Winter Conference on Applications of Computer Vision, 2020,
consider the impact of the negative transfer of knowledge in pp. 749–757.
the modality adaptation technique. If less related knowledge is [15] P. Razzaghi, P. Razzaghi, K. Abbasi, Transfer subspace learning via low-rank
and discriminative reconstruction matrix, Knowl.-Based Syst. 163 (2019)
transferred during the modality adaptation technique, it hinders
174–185.
the final performance of the proposed approach. These issues are [16] M. Wang, W. Deng, Deep visual domain adaptation: A survey,
considered in future studies. Neurocomputing 312 (2018) 135–153.

10
P. Razzaghi, K. Abbasi, M. Shirazi et al. Applied Soft Computing 129 (2022) 109631

[17] J. Blitzer, Domain Adaptation of Natural Language Processing Systems (Doc- [47] Y. Zhu, Z. Zhou, G. Liao, K. Yuan, BCAU-Net: A novel architecture
toral dissertation), University of Pennsylvania, 2008. with binary channel attention module for MRI brain segmentation,
[18] A. Ramponi, B. Plank, Neural unsupervised domain adaptation in NLP—A in: International Conference on Pattern Recognition, ICPR, 2021, pp.
survey, 2020, arXiv preprint arXiv:2006.00632. 5690–5695.
[19] E. Ben-David, C. Rabinovitz, R. Reichart, Perl: Pivot-based domain adapta- [48] J.D. Bodapati, S.N. Shareef, V. Naralasetti, N.B. Mundukur, Msenet: multi-
tion for pre-trained deep contextualized embedding models, Trans. Assoc. modal squeeze-and-excitation network for brain tumor severity prediction,
Comput. Linguist. 8 (2020) 504–521. Int. J. Pattern Recognit. Artif. Intell. 35 (07) (2021) 2157005.
[20] K. Abbasi, P. Razzaghi, A. Poso, M. Amanlou, J.B. Ghasemi, A. Masoudi- [49] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, A.Y. Ng, Multimodal deep
Nejad, DeepCDA: Deep cross-domain compound-protein affinity prediction learning, in: International Conference on Machine Learning, ICML, 2011,
through LSTM and convolutional neural networks, Bioinformatics (2020). pp. 689–696.
[21] S. Boluki, X. Qian, E.R. Dougherty, Optimal Bayesian supervised domain [50] J. Williams, R. Comanescu, O. Radu, L. Tian, Dnn multimodal fusion tech-
adaptation for RNA sequencing data, Bioinformatics (2021). niques for predicting video sentiment, in: Grand Challenge and Workshop
[22] S. Mourragui, M. Loog, M.A. Van De Wiel, M.J. Reinders, L.F. Wessels,
on Human Multimodal Language (Challenge-HML), 2018, pp. 64–72.
PRECISE: A domain adaptation approach to transfer predictors of drug
[51] C. Hong, J. Yu, J. Wan, D. Tao, M. Wang, Multimodal deep autoencoder
response from pre-clinical models to tumors, Bioinformatics 35 (14) (2019)
for human pose recovery, IEEE Trans. Image Process. 24 (12) (2015)
i510–i519.
5659–5670.
[23] P. Razzaghi, Self-taught support vector machines, Knowl. Inf. Syst. 59 (3)
[52] A. Eitel, J.T. Springenberg, L. Spinello, M. Riedmiller, W. Burgard, Mul-
(2019) 685–709.
timodal deep learning for robust RGB-D object recognition, in: IEEE/RSJ
[24] K. Abbasi, A. Poso, J. Ghasemi, M. Amanlou, A. Masoudi-Nejad, Deep
International Conference on Intelligent Robots and Systems, IROS, 2015,
transferable compound representation across domains and tasks for low
pp. 681–687.
data drug discovery, J. Chem. Inf. Model. 59 (11) (2019) 4528–4539.
[25] Y. Zhang, Z. Dong, L. Wu, S. Wang, A hybrid method for MRI brain image [53] G. Lee, B. Kang, K. Nho, K.A. Sohn, D. Kim, MildInt: deep learning-
classification, Expert Syst. Appl. 38 (8) (2011) 10049–10053. based multimodal longitudinal data integration framewor, Front. Genet.
[26] N. Abiwinanda, M. Hanif, S.T. Hesaputra, A. Handayani, T.R. Mengko, 10 (2019) 617.
Brain tumor classification using convolutional neural network, in: World [54] H. Akbari, S. Karaman, S. Bhargava, B. Chen, C. Vondrick, S.F. Chang, Multi-
Congress on Medical Physics and Biomedical Engineering, Singapore, 2019, level multimodal common semantic space for image-phrase grounding, in:
pp. 183–189. IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.
[27] H. Kutlu, E. Avcı, A novel method for classifying liver and brain tumors 12476–12486.
using convolutional neural networks, discrete wavelet transform and long [55] C. Chen, Q. Dou, H. Chen, J. Qin, P.A. Heng, Unsupervised bidirectional
short-term memory networks, Sensors 19 (9) (2019) 1992. cross-modality adaptation via deeply synergistic image and feature align-
[28] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with ment for medical image segmentation, IEEE Trans. Med. Imaging 39 (7)
deep convolutional neural networks, in: Advances in Neural Information (2020) 2494–2505.
Processing Systems, 2012, pp. 1097–1105. [56] M. Yeung, E. Sala, C.B. Schönlieb, L. Rundo, Unified focal loss: generalising
[29] J. Cheng, et al., Enhanced performance of brain tumor classification dice and cross entropy-based losses to handle class imbalanced medical
via tumor region augmentation and partition, PLoS One 10 (10) (2015) image segmentation, 2021, arXiv preprint arXiv:2102.04525.
e0140381. [57] Z. Li, K. Kamnitsas, B. Glocker, Overfitting of neural nets under class
[30] N. Zulpe, V. Pawar, GLCM textural features for brain tumor classification, imbalance: Analysis and improvements for segmentation, in: Interna-
Int. J. Comput. Sci. Issues (IJCSI) 9 (3) (2012) 354. tional Conference on Medical Image Computing and Computer Assisted
[31] S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: Spatial pyramid Intervention, MICCAI, 2019, pp. 402–410.
matching for recognizing natural scene categories, in: IEEE Computer [58] M. Abadi, et al., Tensorflow: A system for large-scale machine learn-
Society Conference on Computer Vision and Pattern Recognition, CVPR, ing, in: 12th {USENIX} Symposium on Operating Systems Design and
2006, pp. 2169–2178. Implementation ({OSDI} 16), 2016, pp. 265–283.
[32] M. Sajjad, S. Khan, K. Muhammad, W. Wu, A. Ullah, S.W. Baik, Multi- [59] F. Chollet, Google, microsoft, and others, in: Keras, GitHub repository, 2015.
grade brain tumor classification using deep CNN with extensive data [60] J. Cheng, et al., Retrieval of brain tumors by adaptive spatial pooling and
augmentation, J. Comput. Sci. 30 (2019) 174–182. fisher vector representation, PLoS One 11 (6) (2016) e0157112.
[33] M. Havaei, et al., Brain tumor segmentation with deep neural networks, [61] P.A. Lachenbruch, M.R. Mickey, Estimation of error rates in discriminant
Med. Image Anal. 35 (2017) 18–31.
analysis, Technometrics 10 (1) (1968) 1–11.
[34] K. Simonyan, A. Zisserman, Very deep convolutional networks for
[62] F. Özyurt, E. Sert, E. Avci, E. Dogantekin, Brain tumor detection based on
large-scale image recognition, 2014, arXiv preprint arXiv:1409.1556.
Convolutional Neural Network with neutrosophic expert maximum fuzzy
[35] P. Afshar, A. Mohammadi, K.N. Plataniotis, Brain tumor type classifica-
sure entropy, Measurement 147 (2019) 106830.
tion via capsule networks, in: IEEE International Conference on Image
[63] A. Pashaei, H. Sajedi, N. Jazayeri, Brain tumor classification via convolu-
Processing, ICIP, 2018, pp. 3129–3133.
tional neural network and extreme learning machines, in: International
[36] S. Sabour, N. Frosst, G.E. Hinton, Dynamic routing between capsules, in:
Conference on Computer and Knowledge Engineering, ICCKE, 2018, pp.
Advances in Neural Information Processing Systems, 2017, pp. 3856–3866.
[37] S. Deepak, P.M. Ameer, Brain tumor classification using deep CNN features 314–319.
via transfer learning, Comput. Biol. Med. 111 (2019) 103345. [64] Z. SobhaniNia, N. Karimi, P. Khadivi, R. Roshandel, S. Samavi, Brain tumor
[38] C. Szegedy, et al., Going deeper with convolutions, in: IEEE Conference on classification using medial residual encoder layers, 2020, arXiv preprint
Computer Vision and Pattern Recognition, 2015, pp. 1–9. arXiv:2011.00628.
[39] B.V. Isunuri, J. Kakarla, Fast brain tumour segmentation using optimized [65] A. Gumaei, M.M. Hassan, M.R. Hassan, A. Alelaiwi, G. Fortino, A hybrid
U-Net and adaptive thresholding, Automatika 61 (3) (2020) 352–360. feature extraction method with regularized extreme learning machine for
[40] S.B. Kumar, R. Panda, S. Agrawal, Brain magnetic resonance image tumor brain tumor classification, IEEE Access 7 (2019) 36266–36273.
detection and segmentation using edgeless active contour, in: International [66] B. Lee, N. amanakkanavar, J.Y. Choi, Automatic segmentation of brain MRI
Conference on Computing, Communication and Networking Technologies, using a novel patch-wise U-net deep architecture, PLoS One 15 (8) (2020)
ICCCNT, 2020, pp. 1–7. e0236493.
[41] M.U. Rehman, S. Cho, J. Kim, K.T. Chong, BrainSeg-Net: Brain tumor MR [67] S. Valverde, A. Oliver, M. Cabezas, E. Roura, X. Lladó, Comparison of 10
image segmentation via enhanced encoder-decoder network, Diagnostics brain tissue segmentation methods using revisited IBSR annotations, J.
11 (2) (2021) 169. Magn. Reson. Imaging 41 (1) (2015) 93–101.
[42] M.I. Sharif, M.A. Khan, M. Alhussein, K. Aurangzeb, M. Raza, A decision [68] D.M. Nguyen, H.T. Vu, H.Q. Ung, B.T. Nguyen, 3D-brain segmentation
support system for multimodal brain tumor classification using deep using deep neural network and Gaussian mixture model, in: IEEE Winter
learning, Complex Intell. Syst. (2021) 1–14. Conference on Applications of Computer Vision, WACV, IEEE, 2017, pp.
[43] Y. Kong, X. Chen, J. Wu, P. Zhang, Y. Chen, H. Shu, Automatic brain tissue 815–824.
segmentation based on graph filter, BMC Med. Imaging 18 (1) (2018) 1–8. [69] X. Liu, F. Chen, Automatic segmentation of 3-d brain mr images by using
[44] T. Sadad, et al., Brain tumor detection and multi-classification using global tissue spatial structure information, IEEE Trans. Appl. Supercond. 24
advanced deep learning techniques, Microsc. Res. Tech. 84 (6) (2021) (5) (2014) 1–5.
1296–1308. [70] J. Tohka, I.D. Dinov, D.W. Shattuck, A.W. Toga, Brain MRI tissue classifica-
[45] B. Zoph, V. Vasudevan, J. Shlens, Q.V. Le, Learning transferable architectures tion based on local Markov random fields, Magn. Reson. Imaging 28 (4)
for scalable image recognition, in: IEEE Conference on Computer Vision and (2010) 557–573.
Pattern Recognition, 2018, pp. 8697–8710. [71] Y. Kong, Y. Deng, Q. Dai, Discriminative clustering and feature selection
[46] J.D. Bodapati, N.S. Shaik, V. Naralasetti, N.B. Mundukur, Joint training of for brain MRI segmentation, IEEE Signal Process. Lett. 22 (5) (2014)
two-channel deep neural network for brain tumor classification, Signal 573–577.
Image Video Process. 15 (4) (2021) 753–760.

11

You might also like