0% found this document useful (0 votes)
19 views20 pages

2022 Performance Enhancement of MRI-based Brain Tumor Classification Using Suitable Segmentation Method and Deep Learning-Based Ensemble Algorithm

Uploaded by

Ansuman Acharya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views20 pages

2022 Performance Enhancement of MRI-based Brain Tumor Classification Using Suitable Segmentation Method and Deep Learning-Based Ensemble Algorithm

Uploaded by

Ansuman Acharya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Biomedical Signal Processing and Control 78 (2022) 104018

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control


journal homepage: www.elsevier.com/locate/bspc

Performance enhancement of MRI-based brain tumor classification using


suitable segmentation method and deep learning-based ensemble algorithm
Gopal S. Tandel a, b, *, Ashish Tiwari a, O.G. Kakde c
a
Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur 440010, India
b
School of Computer Science and Engineering, VIT Bhopal University, India
c
Indian Institute of Information Technology, Nagpur 440006, India

A R T I C L E I N F O A B S T R A C T

Keywords: Glioma is the most common brain tumor in humans. Accurate stage estimation of the tumor is essential for
Ensemble treatment planning. The biopsy is the gold standard method for this purpose. However, it is an invasive pro­
Majority voting cedure, which can prove fatal for patients, if a tumor is present deep inside the brain. Therefore, a magnetic
Magnetic resonance imaging
resonance imaging (MRI) based non-invasive method is proposed in this paper for low-grade glioma (LGG) versus
Brain tumor
Deep learning
high-grade glioma (HGG) classification. To maximize the above classification performance, five pre-trained
Classification convolutional neural networks (CNNs) such as AlexNet, VGG16, ResNet18, GoogleNet, and ResNet50 are
Transfer learning assembled using a majority voting mechanism. Segmentation methods require human intervention and addi­
tional computational efforts. It makes computer-aided diagnosis tools semi-automated. To analyze the perfor­
mance effect of segmentation methods, three segmentation methods such as region of interest MRI segmentation
(RSM) and skull-stripped MRI segmentation (SSM), and whole-brain MRI (WBM) (non-segmentation) data were
compared using above mentioned algorithm. The highest classification accuracy of 99.06 ± 0.55 % was observed
on the RSM data and the lowest accuracy of 98.43 ± 0.89 % was observed on the WSM data. However, only a
0.63 % improvement was found in the accuracy of the RSM data against the WBM data. This shows that deep
learning models have an incredible ability to extract appropriate features from images. Furthermore, the pro­
posed algorithm showed 2.85 %, 1.39 %, 1.26 %, 2.66 %, and 2.33 % improvement in the average accuracy of
the above three datasets over the AlexNet, VGG16, ResNet18, GoogleNet, and ResNet50 models, respectively.

1. Introduction research, IDH mutations were observed in 12 % of glioblastomas (HGG)


[7]and 70 % to 80 % of LGG [8]. IDH mutation will decide the patient
Cancer is the second leading cause of mortality worldwide after survival, IDH wild-type gliomas are more harmful than IDH mutated
cardiovascular diseases [1]. In various cancers, brain tumors are gliomas and have less survival probability [9]. To categorize glioma
considered to be one of the deadliest diseases to their aggressive nature, subtype IDH mutation, the tissue or cell structure of the tumor sample
heterogeneous characteristics, and poor survival rates. Different types of needs to be analyzed through microscopic examination. This procedure
brain tumors exist and they are named based on different factors such as is known as a biopsy. However, this is an invasive procedure, which was
location, texture, shape, and aggressiveness [2,3]. Among all the brain proven life-threatening to brain tumor patients when a tumor exists deep
tumors, glioma is the most common brain tumor. Symptoms and sur­ inside of the brain. Thus, other noninvasive alternative needs to be
vival of glioma patients are depending on the location and their subtype. searched.
The world health organization (WHO), grades glioma into four cate­ Recently in many studies [10–13], Magnetic Resonance Images
gories (grades I-IV) based on the aggressiveness of the tumor cells [4,5]. (MRIs) are proposed as a second alternative (non-invasive) to the
WHO grade I and II is usually referred to as low-grade gliomas (LGG), identification of IDH mutation and grade the gliomas into their subtype.
whereas WHO grade III and IV are referred to as high-grade gliomas However, this is challenging because IDH mutation information is at the
(HGG) [6].isocitrate dehydrogenase (IDH) mutations play a significant molecular level. Even medical experts cannot easily get such informa­
role in prognosis, diagnosis, and guidance for clinical decisions. In tion from MRI. MRI is the radiation-free non-invasive imaging modality

* Corresponding author at: Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur 440010, India.
E-mail addresses: [email protected] (G.S. Tandel), [email protected] (A. Tiwari), [email protected] (O.G. Kakde).

https://doi.org/10.1016/j.bspc.2022.104018
Received 9 March 2022; Received in revised form 30 June 2022; Accepted 20 July 2022
Available online 31 July 2022
1746-8094/© 2022 Elsevier Ltd. All rights reserved.
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

to detect any abnormality in human body organs in 2D or 3D formats characterizing tumor heterogeneity. The best discrimination was seen
[14,15]. Therefore, MRI is the first and foremost choice of doctors for using SD at the fine texture scale and its outcomes in terms of sensitivity
primary tumor diagnosis due to its high-resolution images. However, and specificity were 93 % and 81 % (AUC 0.910, p < 0.0001),
tumor grading using MRI is challenging because of inter-observer vari­ respectively.
ations. This process is also error-prone because tumors have dissimilar From the above-discussed studies, some bottlenecks of ML tech­
shapes and sizes, and do not have enough visible landmarks [10,16]. niques are identified such as (1) feature selection is a complex and time-
Resultantly, it is a time-consuming process for decision-making. Thus, consuming process because many features are possible, and missing any
we can say that human diagnosis is time taking, tedious, unreliable, and suitable features may lead to misclassification. (2) Morphological
error-prone due to inter-observer variations [17,18]. Early and accurate feature-based classification may easily lead to misclassification due to
diagnosis may lead to longer survival among patients. On the other the similar appearance of various types of tumors. These limitations can
hand, invasive, inaccurate, and time-consuming diagnoses will shorten be overcome by DL-based methods [23], where the features are auto­
the life span of a cancer patient. However, this method is primarily used matically extracted from images. DL models were shown superior results
in tumor grading in the absence of a suitable alternative. Thus, artificial than ML [24]. DL methods are not only saving the time of feature se­
intelligence (AI) based computer-aided diagnosis (CAD) tool is highly lection but also domain knowledge is not required. Some of the recently
needed, which can perform a prompt, consistent and accurate diagnosis. published DL-based methods for brain tumor classification are presented
Therefore, an AI-based CAD tool is proposed in this study for the auto­ below.
mated classification of gliomas between low-grade glioma (LGG) and ROI segmentation-based MRI classification method was presented by
high-grade glioma (HGG) and to enhance its performance with the Yang et al. [25] for LGG and HGG classification using Cancer Imaging
proposed ensembling method. The proposed method can be used as a Archive (TCIA) dataset with 113 gliomas patients. The manually crop­
second opinion by the doctors or as an alternative to a biopsy. The ped ROI segmentation method was adopted. The proposed study
summary of the entire study is as follows; the introduction and literature compared the classification performance of fine-tuned pre-trained CNNs
survey are given in Section 1 and Section 2. The data preparation is and CNNs trained from scratch. Experiments depicted that pre-trained
presented in Section 3. The methodology is described in Section 4. Re­ CNN (GoogleNet) with transfer learning shows superior performance
sults and Discussion are given in Section 5 and Section 6, respectively. than training the network from scratch and achieved the best test ac­
The conclusion is given in Section 7. curacy of 90 %, using GoogleNet. Banerjee et al. [26] compared the
performance of CNNs that was trained from scratch and pre-trained
2. Literature review networks for LGG versus HGG classification. The proposed study was
fine-tuned on the BRATS 2017 dataset. CNN models (PatchNet, SliceNet,
Using various medical imaging modalities, CAD tool development for and VolumeNet), trained from scratch and compared with the two pre-
tumor grading is a hot area of research. Where MRI is the first choice of trained ConvNets (VGGNetand ResNet). VolumeNetdemonstratedthe
researchers. The major steps of CAD tools are preprocessing, segmen­ highest accuracy of 95 % for LGG versus HGG classification.Sharif et al.
tation, feature extraction, and classification. Feature selection is an [16] presented a DL-based approach for automated brain tumor ROI
important aspect of CAD tools. Based on feature selection, AI methods segmentation and classification using the Inception V3 pre-trained CNN
are divided into two categories, Machine Learning (ML) and Deep model. The method was applied to BRATS2013, 2014, 2017, and 2018
Learning (DL). In the traditional ML models, the features are manually datasets. the applied method achieved more than 92 % average classi­
defined and known as handcrafted features. On the other hand, in the DL fication accuracy on the above datasets.
techniques, the features are automatically extracted from the images Tandel et al. [24] performed a DL-based study for brain tumor
during the training. Some of the traditional ML-based studies for brain classification using partially segmentation-based (skull-stripping) five
tumor classification are summarized below. clinically relevant multiclass datasets. The pre-trained model AlexNet
Alis et al. [19], presented an artificial neural network (ANN) based has outperformed six ML-based models. The highest average accuracy of
study for gliomas grading between LGG and HGG. A cohort of 181 pa­ two-, three-, four-, five-, and six-class datasets were 100, 95.97, 96.65,
tients including 97 with HGG (53.5 %) and 84 with LGG (46.5 %) 87.14, and 93.74 %, respectively using AlexNetin three kinds of cross-
participated in the study. high-order texture features and histogram validation protocols (K2, K5, and K10). Another partially
parameters were extracted from manual cropped ROIs from T2W-FLAIR segmentation-based brain tumor classification method was proposed by
and contrast-enhanced T1W images. Using ANN models, a test cohort of Khan et al. [27]. The method was used two pre-trained models (CNN),
60 patients was shown an area under the receiver operating character­ namely VGG16 and VGG19 in the transfer learning paradigm for feature
istic curve (AUC) of 0.87 and 0.86 for the T2W-FLAIR and contrast- extraction. The extreme learning machine (ELM) was used for classifi­
enhanced T1W dataset, respectively. The highest diagnostic accuracy cation and obtained an accuracy of 97.8 %, 96.9 %, and 92.5 % for
was 88.3 % with an AUC of 0.92. Further, Ditmer et al. [20], performed BraTs2015, BraTs2017, and BraTs2018, respectively. Ge et al [6], pro­
aradiomic-based filtration-histogram texture analysis for LGG versus posed a deep semi-supervised learning method using multi-stream 2D
HGG classification. A region of interest was manually delineated on CNN, whereas Generative Adversarial Networks were used for the
post-contrast T1W MRI. In histogram analysis, feature such as mean, augmentation. The proposed scheme was tested on two datasets TCGA
standard deviation, entropy, mean of the positive pixels, skewness, and and MICCAI and showed 86.53 % and 90.70 % test accuracy on
kurtosis histogram was examined. The cohort of 94 patients was used TCGAand MICCAI datasets, respectively for LGG versus HGG classifi­
with 14 LGG and 80 HGG patients. The texture features mean, SD, MPP, cation. DL models can extract suitable features directly from the image
entropy, and kurtosis were shown a significant difference between the [23]. Referring to this hypothesis many researchers have used the
grades of gliomas with a sensitivity of 93 % and specificity of 86 % (AUC whole-brain MR image instead of ROI segmented images for brain tumor
of 0.90).Another study by Zhan et al. [21], was performed LGG versus classification. Some of the recently published whole-brain MR image-
HGG classification using intensity, volume, and local binary patterns based brain tumor classification studies are as follows; Khawaldeh
(LBP) features. PCA was used for feature reduction before the classifi­ et al. [28] categorized whole-brain MRI into LGG and HGG classes using
cation. The average grade accuracy of 87.59 % was obtained using the the pre-trained model AlexNet. The highest classification accuracy of
KNN classifier on the BraTS2015 dataset. Similarly, Skogen et al. [22] 91.16 % was obtained for LGG versus HGG MRI classification. Similarly,
was performed ML-based gliomas grading using 95 patients. The ROI- Hassan Ali Khan et al. [29], presented the whole image-based cancerous
based image segmentation method was used for the classification be­ and non-cancerous MRI classification method using pre-trained CNN.
tween LGG and HGG. Whereas filtration-histogram technique and the The method used three pre-trained models, VGG-16, ResNet-50, and
statistical parameter, standard deviation (SD) were used for Inception-v3, and achieved the highest accuracy of 96 % using VGG-16.

2
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Fig. 1. The global architecture of gliomas grading system.

Anaraki et al. [30], proposed a CNN architecture that was developed on brain tumor classification, we have compared the above three seg­
using a genetic algorithm (GA) and classified brain tumors among the mentation methods in the deep learning paradigm. Second, many
three classes Meningiomas, Gliomas, and Pituitary tumors using a studies used multiple models, while highlighting the highest performing
whole-brain MRI-based dataset. The highest accuracy of 94.2 % was models. As a result, the potential of other models was not used. To utilize
seen using the proposed GA-based CNN. In another study, a pre-trained the potential of multiple models, a majority voting-based ensemble al­
GoogleNet was used by Deepak et al. [31] for three-class brain tumor gorithm was designed to enhance the overall classification performance.
MRI classification using a whole-brain MR image dataset. The proposed Thirdly, it is also observed that CNN models show inconsistent perfor­
scheme achieved 98 % classification accuracy using five-fold cross- mance irrespective of the number of deep layers. Therefore, the effect of
validation. Another similar study was presented by Badža et al. [32]. increasing the order of the deep layers of multiple models is analyzed on
Whereas a new CNN architecture was proposed for brain tumor classi­ the accuracy of gliomas classification and the training time of the model.
fication and proved simpler than already-existing pre-trained networks. The global architecture of the proposed system is given in Fig. 1. In
The proposed CNN achieved maximum accuracy of 96.56 % on whole- nutshell, the major contributions of the paper are as follows
brain MRI using 10-fold cross-validation. In another interesting study,
Ziquan Zhu [33] and his team proposed a deep learning-based three • A deep learning-based computer-aided diagnosis tool is proposed for
novel methods for brain disease classification, called DenseNet-based automated glioma classification into low-grade and high-grade using
SNN (DSNN), DenseNet-based RVFL (DRVFL), and DenseNet-based MRI.
ELM (DELM). The purpose of these models is to overcome the limita­ • Compared ROI and skull stripped segmentation method with whole
tions of the deep learning method against overfitting small data. The brain-MRI data.
pre-trained “customize” DenseNet is the foundation of the three pro­ • To maximize the classification performance of five convolutional
posed models. On the empirical dataset, the updated DenseNet is fine- neural networks, a majority voting-based ensemble algorithm is
tuned. Finally, SNN, ELM, and RVFL are used to replace the remaining proposed.
five layers of the fine-tuned DenseNet. The proposed DSNN method • Analyze the effect of the deep layer on accuracy and training time.
achieved a best five-fold cross-validation performance of 98.46 % ±
2.05 %, 100.00 % ± 0.00 %, 85.00 % ± 20.00 %, 98.36 % ± 2.17 %, and 3. Data preparation
99.16 % ± 1.11 % in terms of accuracy and sensitivity, Specificity, Ac­
curacy and F1-Score, respectively. In this study, public brain tumor data were used, taken from the
Although DL methods were extensively used in the above studies and Cancer Imaging Archive (TCIA) repository [35,36]. This dataset was
own many merits. Some limitations are also identified. Firstly, the developed by world-renowned hospitals, Thomas Jefferson University
above-discussed studies were used three types of segmentation ap­ (Pennsylvania, USA) and Henry Ford Hospitals (USA). In this repository,
proaches, (1) tumor region of interest (ROI) segmentation [16,25,26], the brain tumor data is known as “The Repository of Molecular Brain
(2) partial segmentation (Skull stripping) [6,24,27], and (3) whole-brain Neoplasia Data (REMBRANDT)”. The dataset contains MRI scans of 130
MRI (without segmentation) [28,29,30,31,32]. However, the segmen­ brain tumor patients. MRI scans are available in different MRI sequences
tation methods require human intervention and additional computa­ such as T1W, T2W, FLAIR, diffusion-weighted imaging (DWI), and their
tional efforts. The segmentation method makes computer-aided subtypes. The average age and survival rate of the patients were 47.5
diagnostic tools semi-automatic and computationally expensive. To years, and 47 months, respectively. The ground truth is confirmed his­
some extent, deep learning methods are self-contained in extracting topathology and is available for only 112 patients. The 112 patients
suitable features from images [34]. To analyze the segmentation effect included 47, 21, and 44 groups of patients with three tumor types called

3
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Table 1 and class is given in Table 1. However, CNN can automatically extract
Sample details of three datasets. appropriate features from training samples (images), so there is no need
Dataset #Patients Class samples to define features manually.
LGG HGG Total LGG HGG
3.1. Data design
WBM 44 68 112 623 781
SSM 44 68 112 623 781
RSM 44 68 112 623 781
Whole Brain MRI Data: In whole-brain MRI (WBM) data, whole MR
images were used in axial view without any image enhancement oper­
ation, because any image enhancement operation can alter the original
astrocytoma (AST), oligodendroglioma (OLI), and glioblastoma- tumor features. In this dataset, the ground truth was assigned to the
multiforme (GBM), respectively.The AST group of patients consisted of entire MR image regardless of the tumor area.
30 and 17 patients with AST (grade-2) and AST (grade-3 (G3)), Skull Stripped MRI Data: The skull-stripped MRI (SSM) data was
respectively.Similarly, the OLI group was divided into 14 and 7 patients partially segmented, whereas the outer brain skull was segmented from
with OLI (grade-2 (G2) and OLI (grade-3 (G3)), respectively.44 patients the 3D MRI brain volume. This operation was performed through
of GBM group were available in grade 4 (G4). BrainSuite18a software (University of California, Los Angeles). Brain­
The objective of this study is to classify the glioma between low Suite 18a software can operate in automatic as well as manual mode. It
grades and high grades. LGG and HGG grades were designed with can work in the automated model for high-quality MRI with a high
reference to some of the earlier studies [22,28]. The 112 patient data signal-to-noise ratio and automatically detect the extent of the skull. On
were divided into LGG and HGG classes, where AST (g2), and OLI (g2) the other hand, manual models have to opt for poor-quality MRI (low
patients were included in the LGG class. On the other hand, AST (g3), signal-to-noise ratio), and parameters (such as diffusion constant and
OLI (g3), and GBM (g4) were included in the HGG class. A total of 44 edge constant parameters) need to be manually adjusted repeatedly
patients were in the LGG category, while 68 patients were in the HGG when until the exact extent of the skull is not known. In this dataset, the
category.MRI in T2W sequences was used in this study. All the brain ground truth was assigned to the skull-stripped MR images instead of the
scans were analyzed and preprocessed (Skull stripped and ROI tumor area.
segmented) via BrainSuit software [37]. Since some slices of the MRI Tumor Region of Interest Segmented MRI Data: In the RSM data, the
scan contain tumor lesions, therefore, each slice was carefully examined tumor ROIs were manually cropped with a fixed patch size of (70 × 70)
and the center slice (SC) was detected where the tumor lesion was pixels in the axial view from the entire MRI and resized according to the
maximum. Further, surrounding n nearby slices of the center slice were input requirements of the CNN. The ground truth was assigned to the
carefully selected. A total of (Sc ± n) slices were selected from each MR segmented tumor ROI portion of the MR image.
scan, whereas the value of n may be different for each patient depending Some sample images from the above three dataset are shown in
on the size of the tumor. Three datasets of LGG and HGG sections were Fig. 2.
designed from the same patients but with different segmentation
methods such as ROI segmentation, skull-stripping segmentation, and
whole-brain image (without segmentation). A sample of each dataset

Fig. 2. Sample images of (a) WBM: whole-brain MRI, (b) SSM: skull stripped MRI, and (c) RSM: ROI segmented MRI.

4
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Fig. 3. The local architecture of the method.

measures to suppress overfitting. Data augmentation is one of the pop­


Table 2
ular techniques to suppress overfitting through artificial data augmen­
The mathematical expression of performance parameters.
tation techniques. In this study scaling and rotation operations were
Parameter Mathematical Formula performed on the data to artificially enlarge the dataset. Images were
Accuracy (ACC) (TP + TN) scaled randomly between scaling factors [0.9–1.1]. Similarly, images
ACC = × 100
(TP + TN + FP + FN) were randomly rotated between [− 30 to 30] degree angles. As a result,
Sensitivity (SEN)
SEN =
(TP)
× 100 the size of the data increases by three times the original during training.
(TP + FN)
Specificity (SPC) (TN)
SPC = × 100 4. Methodology
(TN + FP)
Positive Predictive Value (PPV) (TP)
PPV = × 100
(TP + FP) In this method, experiments were performed on five-fold cross-vali­
Negative Predictive Value (NPV)
dation (K5-CV) protocol. The local architecture of the brain tumor
(TN)
NPV = × 100
(TN + FN)
grading system is depicted in Fig. 3, which shows the one-round process
of K5-CV. In the K5-CV method, five random sets were created with 80 %
training and 20 % test samples. During training, the training set will be
Table 3
logically divided into 80 % training and 20 % validation sets. After
Training parameters for convolutional neural networks.
completion of training in each round, separately trained models were
Training parameter for CNN Values generated for different CNN models. The test sets were applied to the
Epochs 100 trained models to predict class labels. The predicted class labels were
Batch Size 10 compared with the ground truth and true positive (TP), true negative
Average Iterations 7800
(TN), false positive (FP), and false-negative (FN) were calculated based
Learning Rate 0.0001
Training Protocol K5-CV on assumed classes. In this study, LGG was assumed as a negative class
and HGG as a positive class. With the help of metrics such as TP, TN, FP,
and FN, the performance of the experiment was evaluated in terms of
3.2. Preprocessing accuracy (ACC), sensitivity (SEN), specificity (SPC), positive predictive
value (PPV), negative predictive value (NPV), and the area under the
Deep learning models (CNNs) generally provide good classification curve (AUC). The mathematical expression of the above parameters is
performance on large datasets (ie, millions of images). In the absence of given in Table 2. Simulations were performed on an NVIDIA Quadro K
a large dataset, there is a risk of overfitting during training. Since 6000 series GPU machine with 64 GB of memory. A free trial deep
medical data is always limited, it is highly necessary to take relative learning toolbox from Matlab 2021 was used for the experiments. The

5
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Fig. 4. Transfer Learning for brain tumor classification.

from images than ML-based classification techniques, where feature


Table 4 selection plays a key role. [38].However, getting better results requires
Parameters summary of five pre-trained CNNs.
appropriate architecture, hyper-parameters, and processed data. The DL
Model Layer Depth Parameters Input image size model especially ’CNN’ has recently been extensively used for brain
(In Millions)
tumor-related studies because of the above-mentioned properties.
AlexNet 8 60 227 × 227 ×3 Earlier, several initiatives were taken by researchers using CNN for brain
VGG16 16 138 224 × 224 ×3 tumor grading [31,39–44]. The researchers aim to find suitable model
ResNet18 18 11.7 224 × 224 ×3
GoogleNet 22 7 224 × 224 ×3
architectures, training conditions, and parameters to enhance tumor
ResNet50 50 25.6 224 × 224 ×3 classification results. However, earlier research was done on limited
models. Therefore, this study aims to analyze the behavior of multiple
DL models in their increasing order of deep layers for brain tumor
values of CNN’s training parameters such as learning rate and batch size classification and to optimize their performance using an ensemble
were fixed through experimental observation. In this observation, the approach. Yang et al. 2018 [25] performed an interesting study forLGG
learning rate and batch size were found to be inversely proportional to and HGG classification using the CNN model with the transfer learning
the performance and training time of the CNN model. Therefore, they (TL) method. They trained the model from scratch and re-trained the
were kept as small as possible. All the training parameters are described pre-trained model and compared their performance. As a result, the
in Table 3. performance of the pre-trained CNN was found to be better than the
training model from scratch. Therefore, in order to increase the per­
formance of the model, it is a wise idea to adopt the TL method.
4.1. Transfer learning method To enhance the learning process on a limited dataset, TL is a tech­
nique where the knowledge of the first trained model (weights) is used
DL methods are the subset of ML techniques. It has been demon­ for the second set of data. Even though both datasets may be from
strated that DL models are more effective at extracting desirable features

Fig. 5. The conceptual idea of the MajVot algorithm.

6
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Table 5 4.2. Pre-trained models


Model-wise five-fold cross-validation performances of RSM data.
Models ACC SEN SPC AUC PPV NPV Five well-established pre-trained CNN models AlexNet, VGG16,
ResNet18, ResNet50, and GoogleNet were used in this study. They are
Mean Mean Mean Mean Mean Mean
± SD ± SD ± SD ± SD ± SD ± SD the winner of different years of the ILSVRC challenge. A detailed dis­
cussion of the above models is as follows.
AlexNet 96.82 96.99 96.62 96.81 97.70 95.54
± 1.06 ± 0.58 ± 2.44 ± 1.28 ± 1.68 ± 0.89
AlexNet was introduced by Alex Krizhevsky et al. [48] and was the
VGG16 96.32 95.46 97.78 96.62 98.55 93.04 winner of the ILSVRC 2012 competition. The AlexNet architecture
± 1.21 ± 1.97 ± 1.50 ± 1.02 ± 1.01 ± 3.24 consists of 8 layers, including 5 convolutional layers, 2 fully connected
ResNet18 97.83 97.50 98.37 97.93 98.91 96.25 layers, and 1 softmax layer. After the convolution operation, max
± 0.26 ± 0.62 ± 0.99 ± 0.32 ± 0.66 ± 0.98
pooling, and normalization were performed three and two times,
GoogleNet 95.17 95.37 94.90 95.13 96.62 93.04
± 2.05 ± 2.08 ± 2.40 ± 2.08 ± 1.61 ± 3.18 respectively. Similarly, Rectified Linear Unit (ReLU) is a nonlinear
ResNet50 97.11 97.01 97.27 95.34 98.18 95.54 activation function that is applied after each convolutional operation. In
± 0.81 ± 1.60 ± 0.57 ± 0.63 ± 0.43 ± 2.45 addition, to suppress overfitting, two dropout layers were introduced
MajVot 99.06 99.04 99.10 99.07 99.39 98.57 before the first and second FC layers. The size of the input image for
± 0.55 ± 0.69 ± 0.64 ± 0.54 ± 0.43 ± 1.02
AlexNet is 227x227x3 with a total of 60 million parameters.
VGG16 was proposed by K. Simonyan and A. Zisserman [49]in
different domains [45,46]. Suppose Task T1 has a lot of data and Task T2 ILSVRC-2014from the University of Oxford and it was the runner-up of
has comparatively fewer data. In the TL technique, we can use the the competition. The main contribution of this network was to develop a
previously learned weights of task T1 as the initial step for task T2. In network of increasing depth using an architecture with very small (3 ×
addition, the same model can be retrained for task T2 to generalize this 3) convolution filters. This network was trained over 14 million images
knowledge (features, weights). The concept of TL is described in Fig. 4. belonging to 1000 classes and achieved 92.7 % top-5 test accuracy in the
Now we wonder where the already existing trainee models on large ImageNet dataset. Originally VGGNet was 16 layers deep network
datasets will come from. This question has an answer in the ImageNet (VGG16), later it was proposed in 19 layers (VGG19).
Large Scale Visual Recognition Challenge (ILSVRC) forum. A large-scale GoogleNetwas introduced by Christian Szegedy and his team in
dataset (i.e. Millions of images) was created by ILSVRC. This dataset is ILSVRC 2014 [50] and was the winner of the same competition. It is a 22
also known as ImageNet [47]. This organization organizes a workshop layers deep network, while the top-5 error of this model reaches up to
every year to evaluate algorithms for large-scale object detection and 6.67 %. The basic building block of this model was ’Inception’ which
image classification. Well-known DL models like AlexNet, VGGNet, was designed to reduce the number of intermediate parameters by using
ResNet, GoogleNet, etc. were discovered in this workshop. These models very small convolution filters (1 × 1).
have shown significant classification performance in many applications Residual net (ResNet) was first presented by Kaiming He and his
in the computer vision and medical fields. The model is trained on team [51] in ILSVRC 2015 by the Microsoft research group. This mode
millions of images of more than 1000 natural objects such as table took first place in a similar competition in the classification task with an
chairs, cats, dogs, etc. Their classification accuracy has reached near- error of 3.57 % on the ImageNet test set. The major advantage of this
human intelligence. Therefore, in this study, a pre-trained CNN model architecture is that it can reduce the cost of layer depth in terms of
(trained on the ImageNet dataset) was used. A summary of the above computational time and performance. ResNet architecture has been
models such as layer size, parameters, and input size is given in Table 4 proposed in different layering versions such as 18, 50, 101, etc. In this
and a detailed discussion of the pre-trained network is given in Section study, we have adopted 18 layers (ResNet18) and 50 layers (ResNet50)
4.2. of deep architecture.

Fig. 6. Model-wise performance comparison of RSM Data.

7
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Fig. 7. Accuracy behavior of RSM Data in the five-fold test performance.

Table 6 Table 8
Five-fold cross-validation performances of SSM data. Accuracy improvement of three datasets using MajVot algorithm against five
Models ACC SEN SPC AUC PPV NPV
CNN.
Mean Mean Mean Mean Mean Mean Dataset AlexNet VGG16 ResNet18 GoogleNet ResNet50
± SD ± SD ± SD ± SD ± SD ± SD
MajVot (RSM) 2.26 % 2.77 % 1.24 % 3.93 % 1.97 %
AlexNet 95.08 97.75 93.07 95.41 92.20 97.92 MajVot (SSM) 3.60 % 0.74 % 1.72 % 1.31 % 2.86 %
± 3.86 ± 1.12 ± 6.23 ± 3.32 ± 7.60 ± 1.07 MajVot (WBM) 2.68 % 0.65 % 0.80 % 2.75 % 2.17 %
VGG16 97.90 97.60 98.25 97.92 98.22 97.58
± 0.66 ± 0.75 ± 1.51 ± 0.66 ± 1.57 ± 0.82
ResNet18 96.94 98.69 95.41 97.05 95.12 98.72
± 0.73 ± 1.47 ± 1.87 ± 0.70 ± 2.07 ± 1.45
Table 9
GoogleNet 97.34 96.53 98.20 97.37 98.22 96.47
The model-wise average performance of three datasets.
± 1.44 ± 2.37 ± 1.06 ± 1.43 ± 1.06 ± 2.43
ResNet50 95.81 98.62 93.35 95.99 92.85 98.72 Model ACC SEN SPC AUC PPV NPV
± 0.93 ± 0.98 ± 1.17 ± 0.90 ± 1.34 ± 0.91
AlexNet 95.90 96.96 95.02 96.02 95.39 96.23
MajVot 98.63 98.70 98.57 98.63 98.54 98.72
VGG16 97.34 97.09 97.79 97.33 98.20 96.14
± 0.46 ± 0.44 ± 0.66 ± 0.46 ± 0.68 ± 0.44
ResNet18 97.47 97.91 97.19 97.37 97.41 97.33
GoogleNet 96.08 96.01 96.14 96.14 96.99 94.95
ResNet50 96.41 98.18 94.66 95.56 95.13 97.66
MajVot 98.71 98.69 98.75 98.27 98.92 98.41
Table 7
Five-fold cross-validation performances of WBM data.
Models ACC SEN SPC AUC PPV NPV performance for brain tumor classification. During the experiments, it
Mean Mean Mean Mean Mean Mean was observed that the above CNN model showed inconsistent perfor­
± SD ± SD ± SD ± SD ± SD ± SD mance in increasing the order of deep layers. In addition, an inconsistent
AlexNet 95.80 96.14 95.38 95.85 96.28 95.22 performance was also observed by CNNs in five-folds of the data. In
± 0.92 ± 0.45 ± 1.71 ± 0.91 ± 1.37 ± 0.58 order to generate consistent performance across each fold of the data
VGG16 97.79 98.20 97.33 97.43 97.83 97.80 and further maximize the overall performance, we have proposed an
± 1.02 ± 1.65 ± 2.08 ± 1.50 ± 1.73 ± 2.00
ResNet18 97.65 97.55 97.79 97.13 98.20 97.01
ensemble algorithm using a majority voting mechanism. In this mech­
± 1.11 ± 0.84 ± 2.11 ± 1.65 ± 1.71 ± 1.02 anism, the vote of each class label will be counted using the predicted
GoogleNet 95.73 96.11 95.32 95.91 96.12 95.36 probability of each mode for each test sample in terms of zero and one.
± 1.65 ± 2.67 ± 2.54 ± 1.40 ± 2.15 ± 3.05 Further, from the majority vote of many models, true class labels have
ResNet50 96.30 98.92 93.34 95.34 94.36 98.72
been predicted. To avoid a tie between the two classes due to equal vote
± 0.93 ± 2.99 ± 3.64 ± 1.03 ± 3.09 ± 3.37
MajVot 98.43 98.33 98.57 98.45 98.84 97.95 share, the number of models (voters) was kept odd (five). The concep­
± 0.89 ± 1.25 ± 1.04 ± 0.88 ± 0.84 ± 1.53 tual idea of the MajVot algorithm is shown in Fig. 5. Recently, we have
used the same algorithm for performance optimization of brain tumor
classification using whole image data as in our study [52]. This study is a
4.3. Majority voting algorithm future extension of our previous work. A detailed discussion of the al­
gorithm is given below.
This study aims to observe the behavior of varying deep layers of The Majority voting (MajVot) algorithm takes ‘n’ models (M1, M2,
different models on accuracy and training time and optimize their M3,……Mn) and dataset (DS) as an argument. The dataset was divided

8
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Table 10 test sample (Si). If (Tvc(LGG) > Tvc(HGG) then the predicted label via
Benchmarking of the proposed method with existing state-of-the-art methods. the MajVot algorithm was LGG otherwise HGG. Likewise, the maximum
Segmentation Reference Data Source Deep Highest probability (MaxP) of n models was assigned to the predicted label, and
Method Learning Accuracy the (1-MaxP) probability was assigned to the unpredicted label. This
Method process was repeated for all the test samples. The performance of the
(CNN)
MajVot algorithm was evaluated based on a comparison of ground truth
(1) Yang et al. TCIA GoogleNet 90 % and predicted labels.
ROI [25] (REMBRANDT)
Segmented Banerjee MICCAI Proposed 97 %
Majority Voting Algorithm.
MRI (RSM) et al. [26] (BraTs 2017) ConvNets
Data Sharif et al. MICCAI Inception V3 92 %
[16] (BraTs Algorithm: Majority Voting (MajVot)
2013–2018)
Input:n Models, Training set (Ts), Test set (Tt)
Proposed TCIA MajVot 99.06 %
Output: Predicted class labels, label probability score, and performance evaluation.
Method on (REMBRANDT) (AlexNet,
1. Train all n models on the same training set Ts
RSM Data VGG16,
2. Test a single sample of the test set (Tt) by all trained models.
ResNet18,
3. Predicted probability score of the label by each trained model of the given sample.
GoogleNet,
4. Calculate the vote of each label for the tested sample by the following rule. IF the
ResNet50)
predicted probability by a model for label LGG>= 0.5 THEN vote = 1 for label
(2) Tandel et al. TCIA AlexNet 94.70 %
LGGand vote = 0 for label HGG. Otherwise, vote = 1 for label HGGand vote = 0 for
Skull [24] (REMBRANDT)
label LGG.
Stripped Khan et al. MICCAI VGG16 97.8 %
5. Repeat step 4 for all the trained models.
MRI (SSM) [27] (BraTs 2015)
6. Calculate the sum of votes of each predicted label by all the models. IF the sum of
Data Chenjie Ge MICCAI Multi-stream- 90.70 %
the votes of label LGG > sum of the votes of label HGGTHEN label LGG will predict.
et al. [6] (BraTs) 2D CNN
Otherwise, label HGG will predict.
Proposed TCIA MajVot 98.63 %
7. Repeat step 3 to step 6 for all the test samples.
Method on (REMBRANDT) (AlexNet,
8. Compare the predicted labels with the actual ground truth of all the samples of the
SSM Data VGG16,
test set and evaluate the performance of the method.
ResNet18,
GoogleNet,
ResNet50)
(3) Khawaldeh TCIA Modified- 91.16 %
Whole-Brain et al. [28] (REMBRANDT) AlexNet 5. Results
MRI (WBM) Ali Khan Brain MRI VGG-16 96 %
Data et al. [29] (Kaggle)
Anaraki TCIA GA based 94.2 % Four experimental protocols are included throughout the study. The
et al. [30] (REMBRANDT) CNN first experimental protocol is to analyze the data-wise performance. The
(TCGA (LGG and second experimental protocol compares the performance of the MajVot
GBM))
algorithm against other models. The average performance of the three
Deepak et al. Figshar GoogleNet 98 %
[31]
datasets is compared to the third experimental protocol. The fourth
Badža et al. Nanfang Hospital Proposed 96.56 % experimental protocol investigates the impact of the deep layer on ac­
[32] and General CNN curacy and training time. All the results are the average of five-fold test
Hospital, Tianjin set performances and expressed in mean and standard deviation (SD).
Medical
Maximum performance in each table is highlighted in bold font.
University, China
Proposed TCIA MajVot 98.43 %
Method on (REMBRANDT) (AlexNet, 5.1. Experimental protocol 1: Data-wise performance analysis
WBM Data VGG16,
ResNet18,
Data-wise analysis of three datasets (RSM, SSM, and WBM) is dis­
GoogleNet,
ResNet50)
cussed in this experimental protocol. The test performance of one round
of a model (PMi )is described by Equation (1), which is a set of the
TCIA: Cancer Imaging Archive, REMBRANDT: The Repository of Molecular following performance parameters; (ACC, SEN, SPC, AUC, PPV, and
Brain Neoplasia Data, MICCAI: Medical Image Computing and Computer-
NPV). Six models were used in this study, including five pre-trained
Assisted Interventions Society, BraTs: Brain Tumor Segmentation, RSM: ROI
CNNs such as AlexNet, VGG16, ResNet18, GoogleNet, ResNet50, and a
Segmented MRI, SSM: Skull Stripped MRI, WBM: Whole-Brain MRI, GA: genetic
algorithm proposed MajVot algorithm. Thus, the value of i ranges from one to six.
The performance analysis of each dataset is discussed below.
into K parts of the training (Ts) and test (Tt) set. In this study we have PMi = (ACCMi , SEMi , SPMi , AUCMi , PPVMi , NPVMi ) (1)
adopted five-fold cross-validation, therefore five independent sets were
made of 80 % of DS for training and 20 % of DS for testing. Further, all 5.1.1. Performance analysis of ROI segmented MRI data
the models were trained via each training set and trained models were The average five-fold test performance of RSM data of ith model
generated (TM1, TM2, TM3,……TMn). ith trained model (TMi) predicts (∀PMi ) is described by Equation (2). The variable Ris described the
probabilities Pi (LGG), and Pi (HGG), for class label LGG, and HGG, number of rounds of the training and test operations of a model. The
respectively for a sample (Si). Based on the predicted probability of each average five-fold test performances of RSM data using five CNN models
label, the voting mechanism was designed. The vote of each ith model for and one MajVot algorithm are compared in Table 5. The highest clas­
label LGG and HGG were calculated using vote count variables vcLGG(i) sification performance was observed using the MajVot algorithm and is
and vcHGG(i), respectively. If the predicted probability of model (TMi) as follows ACC: 99.06 ± 0.55, SEN: 99.04 ± 0.69, SPC: 99.10 ± 0.64,
is (Pi (LGG) > Pi (HGG)) then vote count of class label LGG: vcLGG(i) = 1, AUC: 99.07 ± 0.54, PPV: 99.39 ± 0.43, and NPV:98.57 ± 1.02. MajVot
and HGG: vcHGG(i) = 0 and wise a versa. Further, the total vote of all the algorithm significantly improves the classification accuracy by 2.26 %,
models for labels LGG and HGG of a sample Si was calculated using total 2.77 %, 1.24 %, 3.93 % and 1.97 % against AlexNet, VGG16, ResNet18,
vote count variables, Tvc(LGG) and Tvc(HGG), respectively. Further­ GoogleNet, and ResNet50, respectively.The performance of five CNNs
more, the following mechanism was developed for label prediction using against the MajVot algorithm is compared in Fig. 6. The detailed results
the MajVot algorithm using the above total vote shares of n models for a of five-fold test sets of RSM data using AlexNet, VGG16, ResNet18,

9
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Fig. 8. Model-wise performance comparison of SSM Data.

Table A1
Five-fold test results of RSM data using AlexNet.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 162 108 3 4 97.47 97.59 97.30 97.44 98.18 96.43 7200.00 36.00
2 159 106 6 6 95.67 96.36 94.64 95.50 96.36 94.64 7200.00 35.00
3 158 107 7 5 95.67 96.93 93.86 95.40 95.76 95.54 7200.00 37.00
4 165 106 0 6 97.83 96.49 100.00 98.25 100.00 94.64 7200.00 45.00
5 162 108 3 4 97.47 97.59 97.30 97.44 98.18 96.43 7200.00 44.00
AVG 161.20 107.00 3.80 5.00 96.82 96.99 96.62 96.81 97.70 95.54 7200.00 39.40
SD 2.77 1.00 2.77 1.00 1.06 0.58 2.44 1.28 1.68 0.89 0.00 4.72

Table A2
Five-fold test results of RSM data using VGG16.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 161 107 4 5 96.75 96.99 96.40 96.69 97.58 95.54 7200.00 305.00
2 161 106 4 6 96.39 96.41 96.36 96.39 97.58 94.64 7200.00 303.00
3 163 98 2 14 94.22 92.09 98.00 95.05 98.79 87.50 7200.00 331.00
4 165 104 0 8 97.11 95.38 100.00 97.69 100.00 92.86 7200.00 384.00
5 163 106 2 6 97.11 96.45 98.15 97.30 98.79 94.64 7200.00 344.00
AVG 162.60 104.20 2.40 7.80 96.32 95.46 97.78 96.62 98.55 93.04 7200.00 333.40
SD 1.67 3.63 1.67 3.63 1.21 1.97 1.50 1.02 1.01 3.24 0.00 33.20

Table A3
Five-fold test results of RSM data using ResNet18.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 165 106 0 6 97.83 96.49 100.00 98.25 100.00 94.64 7200.00 136.00
2 163 108 2 4 97.83 97.60 98.18 97.89 98.79 96.43 7200.00 82.00
3 163 109 2 3 98.19 98.19 98.20 98.20 98.79 97.32 7200.00 107.00
4 163 108 2 4 97.83 97.60 98.18 97.89 98.79 96.43 7200.00 76.00
5 162 108 3 4 97.47 97.59 97.30 97.44 98.18 96.43 7200.00 102.00
AVG 163.20 107.80 1.80 4.20 97.83 97.50 98.37 97.93 98.91 96.25 7200.00 100.60
SD 1.10 1.10 1.10 1.10 0.26 0.62 0.99 0.32 0.66 0.98 0.00 23.70

GoogleNet, and ResNet50, and MajVot algorithm are given in Table A1, ∑5
Table A2, Table A3, Table A4, Table A5, and Table A6 in Appendix-A, ∀PMi = R=1 PMi (RSM)
(2)
respectively. The behavior of the five-fold test accuracy of each model 5
is shown in Fig. 7. This suggests that CNN models exhibit highly
inconsistent performance across different folds of data. Whereas, the 5.1.2. Performance analysis of skull stripped MRI data
accuracy of the MajVot algorithm in RSM data was found to be consis­ The average five-fold test performance of SSM data of ith model
tent and better across all the folds of data. (∀PMi ) is depicted by Equation (3). The five-fold cross-validation clas­
sification performances of SSM data using five CNNs and the MajVot

10
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Table A4
Five-fold test results of RSM data using GoogleNet.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 161 107 4 5 96.75 96.99 96.40 96.69 97.58 95.54 7200.00 44.00
2 159 102 6 10 94.22 94.08 94.44 94.26 96.36 91.07 7200.00 88.00
3 160 101 5 11 94.22 93.57 95.28 94.43 96.97 90.18 7200.00 102.00
4 157 102 10 10 92.83 94.01 91.07 92.54 94.01 91.07 7200.00 131.00
5 162 109 3 3 97.83 98.18 97.32 97.75 98.18 97.32 7200.00 111.00
AVG 159.80 104.20 5.60 7.80 95.17 95.37 94.90 95.13 96.62 93.04 7200.00 95.20
SD 1.92 3.56 2.70 3.56 2.05 2.08 2.40 2.08 1.61 3.18 0.00 32.60

Table A5
Five-fold test results of RSM data using ResNet50.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 162 108 3 4 97.47 97.59 97.30 97.44 98.18 96.43 7200.00 220.00
2 161 111 4 1 98.19 99.38 96.52 97.95 97.58 99.11 7200.00 235.00
3 162 104 3 8 96.03 95.29 97.20 96.25 98.18 92.86 7200.00 236.00
4 163 105 2 7 96.75 95.88 98.13 97.01 98.79 93.75 7200.00 336.00
5 162 107 3 5 97.11 97.01 97.27 97.14 98.18 95.54 7200.00 246.00
AVG 162.00 107.00 3.00 5.00 97.11 97.01 97.27 95.34 98.18 95.54 7200.00 254.60
SD 0.71 2.74 0.71 2.74 0.81 1.60 0.57 0.63 0.43 2.45 0.00 46.44

Table A6
Five-fold test results of RSM data using MajVot Algorithm.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV

1 164 110 1 2 98.92 98.80 99.10 98.95 99.39 98.21


2 164 112 1 0 99.64 100.00 99.12 99.56 99.39 100.00
3 163 109 2 3 98.19 98.19 98.20 98.20 98.79 97.32
4 164 111 1 1 99.28 99.39 99.11 99.25 99.39 99.11
5 165 110 0 2 99.28 98.80 100.00 99.40 100.00 98.21
AVG 164.00 110.40 1.00 1.60 99.06 99.04 99.10 99.07 99.39 98.57
SD 0.71 1.14 0.71 1.14 0.55 0.69 0.64 0.54 0.43 1.02

Table B1
Five-fold test results of SSM data using AlexNet.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 124 123 0 1 99.60 99.20 100.00 99.60 100.00 99.19 9900 46.00
2 113 121 10 4 94.35 96.58 92.37 94.47 91.87 96.80 9900 45.00
3 98 123 25 2 89.11 98.00 83.11 90.55 79.67 98.40 9900 47.00
4 116 121 7 4 95.56 96.67 94.53 95.60 94.31 96.80 9900 55.00
5 117 123 6 2 96.77 98.32 95.35 96.83 95.12 98.40 9900 54.00
AVG 113.60 122.20 9.60 2.60 95.08 97.75 93.07 95.41 92.20 97.92 9900.00 49.40
SD 9.61 1.10 9.34 1.34 3.86 1.12 6.23 3.32 7.60 1.07 0.00 4.72

Table B2
Five-fold test results of SSM data using VGG16.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 125 118 1 4 97.98 96.90 99.16 98.03 99.21 96.72 9900 315.00
2 121 123 2 2 98.39 98.37 98.40 98.39 98.37 98.40 9900 313.00
3 120 123 3 2 97.98 98.36 97.62 97.99 97.56 98.40 9900 341.00
4 118 122 5 3 96.77 97.52 96.06 96.79 95.93 97.60 9900 394.00
5 123 121 0 4 98.39 96.85 100.00 98.43 100.00 96.80 9900 354.00
AVG 121.40 121.40 2.20 3.00 97.90 97.60 98.25 97.92 98.22 97.58 9900.00 343.40
SD 2.70 2.07 1.92 1.00 0.66 0.75 1.51 0.66 1.57 0.82 0.00 33.20

algorithm are represented in Table 6. The highest classification perfor­ GoogleNet, and ResNet50, and MajVot algorithm are given in Table B1,
mance was observed using the MajVot algorithm and is as follows ACC: Table B2, Table B3, Table B4, Table B5, and Table B6 in Appendix-B,
98.63 ± 0.46, SEN: 98.70 ± 0.44, SPC:98.57 ± 0.66, AUC: 98.63 ± respectively. The behavior of the five-fold test accuracy of each model
0.46, PPV:98.54 ± 0.68, and NPV:98.72 ± 0.44. MajVot algorithm is shown in Fig. 9. This demonstrates that CNN models perform incon­
significantly improves the classification accuracy of SSM data by 3.60 %, sistently across data folds. In contrast, the MajVot algorithm’s accuracy
0.74 %, 1.72 %, 1.31 % and 2.86 % against AlexNet, VGG16, ResNet18, for SSM data was found to be consistent and improved across all folds of
GoogleNet, and ResNet50 models, respectively. The performance of five data.
CNNs against the MajVot algorithm is compared in Fig. 8. The detailed
performance of five tests set using AlexNet, VGG16, ResNet18,

11
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Table B3
Five-fold test results of SSM data using ResNet18.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 117 125 6 0 97.58 100.00 95.42 97.71 95.12 100.00 9900 146.00
2 116 125 7 0 97.18 100.00 94.70 97.35 94.31 100.00 9900 92.00
3 117 122 6 3 96.37 97.50 95.31 96.41 95.12 97.60 9900 117.00
4 121 121 2 4 97.58 96.80 98.37 97.59 98.37 96.80 9900 86.00
5 114 124 9 1 95.97 99.13 93.23 96.18 92.68 99.20 9900 112.00
AVG 117.00 123.40 6.00 1.60 96.94 98.69 95.41 97.05 95.12 98.72 9900.00 110.60
SD 2.55 1.82 2.55 1.82 0.73 1.47 1.87 0.70 2.07 1.45 0.00 23.70

Table B4
Five-fold test results of SSM data using GoogleNet.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 120 117 3 8 95.56 93.75 97.50 95.63 97.56 93.60 9900 54.00
2 124 120 2 2 98.39 98.41 98.36 98.39 98.41 98.36 9900 98.00
3 120 124 3 1 98.39 99.17 97.64 98.41 97.56 99.20 9900 112.00
4 123 121 0 4 98.39 96.85 100.00 98.43 100.00 96.80 9900 141.00
5 120 118 3 7 95.97 94.49 97.52 96.00 97.56 94.40 9900 121.00
AVG 121.40 120.00 2.20 4.40 97.34 96.53 98.20 97.37 98.22 96.47 9900.00 105.20
SD 1.95 2.74 1.30 3.05 1.44 2.37 1.06 1.43 1.06 2.43 0.00 32.60

Table B5
Five-fold test results of SSM data using ResNet50.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 116 125 7 0 97.18 100.00 94.70 97.35 94.31 100 9900 230.00
2 112 124 11 1 95.16 99.12 91.85 95.48 91.06 99.2 9900 245.00
3 115 123 8 2 95.97 98.29 93.89 96.09 93.50 98.4 9900 246.00
4 113 122 10 3 94.76 97.41 92.42 94.92 91.87 97.6 9900 346.00
5 115 123 8 2 95.97 98.29 93.89 96.09 93.50 98.4 9900 256.00
AVG 114.20 123.40 8.80 1.60 95.81 98.62 93.35 95.99 92.85 98.72 9900.00 264.60
SD 1.64 1.14 1.64 1.14 0.93 0.98 1.17 0.90 1.34 0.91 0.00 46.44

Table B6
Five-fold test results of SSM data using MajVot Algorithm.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV

1 122 123 1 2 98.79 98.39 99.19 98.79 99.19 98.4


2 121 124 2 1 98.79 99.18 98.41 98.80 98.37 99.2
3 120 123 3 2 97.98 98.36 97.62 97.99 97.56 98.4
4 121 123 2 2 98.39 98.37 98.40 98.39 98.37 98.4
5 122 124 1 1 99.19 99.19 99.20 99.19 99.19 99.2
AVG 121.20 123.40 1.80 1.60 98.63 98.70 98.57 98.63 98.54 98.72
SD 0.84 0.55 0.84 0.55 0.46 0.44 0.66 0.46 0.68 0.44

∑5 each model is shown in Fig. 11. This demonstrates that CNN models
R=1 PMi (SSM)
∀PMi = (3) perform quite unevenly when applied to various data folds. While it was
5
discovered that the MajVote algorithm’s accuracy for WBM data was
5.1.3. Performance analysis of whole-brain MRI data more consistent across all data folds.
The average performance of five-fold test sets of WBM data of ith ∑5
PM (WBM)
model (∀PMi ) is depicted by Equation (3). The five-fold cross-validation ∀PMi = R=1 i (4)
5
classification performances of WBM data using five CNNs and the Maj­
Vot algorithm are depicted in Table 7. The highest classification per­ If we compare the result of the above three datasets, we get the
formance was observed using the MajVot algorithm and is as follows highest accuracy of RSM, SSM, and WSM data at 99.06 %, 98.63 %, and
ACC: 98.43 ± 0.89, SEN: 98.33 ± 1.25, SPC:98.57 ± 1.04, AUC: 98.45 98.43 % respectively. where the highest accuracy of 99.06 % was
observed in the ROI segmented data. In addition, the lowest accuracy
± 0.88, PPV:98.84 ± 0.84, and NPV: 97.95 ± 1.53. The MajVot algo­ was observed in the whole-brain MRI data at 98.43 %. Therefore, the
rithm significantly improves the classification accuracy of WBM data by RSM data showed an improvement of 0.43 % and 0.65 % accuracy over
2.26 %, 2.77 %, 1.24 %, 3.93 % and 1.97 % against AlexNet, VGG16, the SSM and WSM data, respectively. Some intermediate confusion
ResNet18, GoogleNet, and ResNet50, respectively.The performance of matrixes of the MajVot algorithm and some intermediate training curves
five CNNs against the MajVot algorithm is compared in Fig. 10. The of model ResNet18 are depicted in Fig. D1 and Fig. D2 in Appendix-D,
detailed results of five test sets for WBM datausingAlexNet, VGG16, respectively.
ResNet18, GoogleNet, and ResNet50, and MajVot algorithm are given in
Table C1, Table C2, Table C3, Table C4, Table C5, and Table C6 in
Appendix-C, respectively. The behavior of the five-fold test accuracy of

12
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Fig. 9. Accuracy behavior of SSM Data in the five-fold test performance.

Fig. 10. Model-wise performance comparison of WBM Data.

Table C1
Five-fold test results of WBM data using AlexNet.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 150 122 6 3 96.80 96.15 97.60 96.88 98.04 95.31 11200.00 53.00
2 151 120 5 5 96.44 96.79 96.00 96.40 96.79 96.00 11200.00 52.00
3 149 117 7 8 94.66 95.51 93.60 94.56 94.90 94.35 11200.00 60.00
4 150 120 6 5 96.09 96.15 96.00 96.08 96.77 95.24 11200.00 64.00
5 148 119 8 6 95.02 96.10 93.70 95.34 94.87 95.20 11200.00 54.00
AVG 149.60 119.60 6.40 5.40 95.80 96.14 95.38 95.85 96.28 95.22 11200.00 56.60
SD 1.14 1.82 1.14 1.82 0.92 0.45 1.71 0.91 1.37 0.58 0.00 5.18

13
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Table C2
Five-fold test results of WBM data using VGG16.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 156 122 0 3 98.93 100.00 97.60 98.80 98.11 100.00 11200.00 361.00
2 154 123 2 2 98.58 98.72 98.40 98.56 98.72 98.40 11200.00 341.00
3 151 120 5 5 96.44 96.79 96.00 96.40 96.79 96.00 11200.00 360.00
4 150 125 6 0 97.86 96.15 100.00 98.08 100.00 95.42 11200.00 375.00
5 149 124 7 1 97.15 99.33 94.66 95.34 95.51 99.20 11200.00 350.00
AVG 152.00 122.80 4.00 2.20 97.79 98.20 97.33 97.43 97.83 97.80 11200.00 357.40
SD 2.92 1.92 2.92 1.92 1.02 1.65 2.08 1.50 1.73 2.00 0.00 12.78

Table C3
Five-fold test results of WBM data using ResNet18.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 151 124 5 1 97.86 96.79 99.20 98.00 99.34 96.12 11200.00 98.00
2 152 124 4 1 98.22 97.44 99.20 98.32 99.35 96.88 11200.00 101.00
3 153 124 3 1 98.58 98.08 99.20 98.64 99.35 97.64 11200.00 110.00
4 152 123 4 2 97.86 98.70 96.85 95.34 97.44 98.40 11200.00 115.00
5 149 120 7 5 95.73 96.75 94.49 95.34 95.51 96.00 11200.00 108.00
AVG 151.40 123.00 4.60 2.00 97.65 97.55 97.79 97.13 98.20 97.01 11200.00 106.40
SD 1.52 1.73 1.52 1.73 1.11 0.84 2.11 1.65 1.71 1.02 0.00 6.88

Table C4
Five-fold test results of WBM data using GoogleNet.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 150 122 6 3 96.80 96.15 97.60 96.88 98.04 95.31 11200.00 115.00
2 143 121 13 4 93.95 91.67 96.80 94.23 97.28 90.30 11200.00 103.00
3 154 121 2 4 97.86 98.72 96.80 97.76 97.47 98.37 11200.00 120.00
4 145 120 11 5 94.31 96.67 91.60 95.34 92.95 96.00 11200.00 126.00
5 148 121 8 4 95.73 97.37 93.80 95.34 94.87 96.80 11200.00 130.00
AVG 148.00 121.00 8.00 4.00 95.73 96.11 95.32 95.91 96.12 95.36 11200.00 118.80
SD 4.30 0.71 4.30 0.71 1.65 2.67 2.54 1.40 2.15 3.05 0.00 10.52

Table C5
Five-fold test results of WBM data using ResNet18.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV ITR TIME

1 153 122 3 3 97.86 98.08 97.60 97.84 98.08 97.60 11200.00 263.00
2 145 125 11 0 96.09 92.95 100.00 96.47 100.00 91.91 11200.00 264.00
3 145 125 11 0 96.09 92.95 100.00 96.47 100.00 91.91 11200.00 275.00
4 146 124 10 1 96.09 99.32 92.54 95.34 93.59 99.20 11200.00 260.00
5 147 121 9 4 95.37 97.35 93.08 95.34 94.23 96.80 11200.00 280.00
AVG 147.20 123.40 8.80 1.60 96.30 98.92 93.34 95.34 94.36 98.72 11200.00 268.40
SD 3.35 1.82 3.35 1.82 0.93 2.99 3.64 1.03 3.09 3.37 0.00 8.62

Table C6
Five-fold test results of WBM data using MajVot Algorithm.
#Round TP TN FP FN ACC SEN SPC AUC PPV NPV

1 154 125 2 0 99.29 98.72 100.00 99.36 100.00 98.43


2 151 122 5 3 97.15 96.79 97.60 97.20 98.05 96.06
3 152 124 4 1 98.22 97.44 99.20 98.32 99.35 96.88
4 154 125 2 0 99.29 100.00 98.43 99.21 98.72 100.00
5 153 123 3 2 98.22 98.71 97.62 98.16 98.08 98.40
AVG 152.80 123.80 3.20 1.20 98.43 98.33 98.57 98.45 98.84 97.95
SD 1.30 1.30 1.30 1.30 0.89 1.25 1.04 0.88 0.84 1.53

5.2. Experimental protocol 2: Performance improvement of the MajVot (3.93 %) was observed by the MajVot algorithm for the RSM data against
algorithm against other models GoogleNet. The lowest accuracy improvement (0.65 %), was found by
the MajVot algorithm for the WBM data against the VGG16 model.
In this experimental protocol, the accuracy of three datasets of the
(a − b)
MajVot algorithm is compared with that of the other five CNNs and is IMP = × 100 (5)
a
shown in Table 8 and Fig. 12. The percent accuracy improvement (IMP)
between the two models is mathematically expressed by Equation (5),
where variable a is the accuracy of the MajVot algorithm and b denotes
the accuracy of the other relative models. Highestaccuracy improvement

14
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Fig. 11. Accuracy behavior of WBM Data in the five-fold test performance.

Fig. 12. Data-wise accuracy improvement of MajVot algorithm against five CNN.

5.3. Experimental protocol 3: average performance of three datasets 5.4. Experimental protocol 4: effect of deep layers on accuracy and
training time
In this experimental protocol, the model-wise average performance
of three datasets has been compared and their comparative performance The purpose of this experimental protocol is to analyze the effect of
is given in Table 9. The average performance of three datasets of a model deep layers on training time and accuracy in increasing the order of
Mi(∀PMi ) is depicted by Equation (6). The MajVot algorithm showed the convolutional layers of five CNNs. Therefore, the model-wise effect of
maximum average performance of three datasets and is as follows; ACC: convolutional layers on the mean accuracy and average training time of
98.71, SEN: 98.69, SPC = 98.75, AUC = 98.27, PPV = 98.92, and NPV = the three datasets is shown in Fig. 14 and Fig. 15, respectively. The
98.41. The second-highest performance 18 layer deep model was shown highest accuracy (97.47 %) was observed on 18 layers deep model and
by ResNet18 and the lowest performance was observed using 8 layers the lowest accuracy (95.90 %) was observed on 8 layers network.
deep model AlexNet.The MajVot algorithm showed 2.85 %, 1.39 %, Similarly, the highest average training time (344.73 min) was taken by
1.26 %, 2.66 %, and 2.33 % improvements in the average accuracy of the the 16-layer deep model (VGG16) and the lowest training time (48.47
three datasets over AlexNet, VGG16, ResNet18, GoogleNet, and min) was taken by the 8-layer network AlexNet.This shows that the ef­
ResNet50, respectively. This shows that the performance of CNNs is not fect of deep layers on accuracy and training time is inconsistent.
linear with respect to their deep layers. The ROC of the MajVot algo­
rithm is compared with that of five CNNs, as shown in Fig. 13. The 6. Discussion
MajVot algorithm demonstrated the highest ROC compared to other
models. The Biopsy is the gold standard for tumor grading. However, it has
∑5 ∑ ∑ many limitations such as inherently invasive procedures, sampling er­
PM (RSM) + 5R=1 PMi (SSM) + 5R=1 PMi (WBM) rors, and variability in tumor interpretation. Recently, MRI-based non-
∀PMi = R=1 i (6)
15 invasive methods of tumor grading are playing an increasing role in
glioma grading. Many single data and single model-based studies were
proposed earlier for glioma grading. This study was designed with three

15
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Fig. 13. ROC comparison of five CNNs against MajVot algorithm.

Fig. 14. Effect of deep layers of five CNNs on accuracy.

main objectives. First, enhance the classification performance of the five %, 1.26 %, 2.66 %, and 2.33 % improvement in the average accuracy of
DL models for brain tumor classification using a majority voting-based three datasets against AlexNet, VGG16, ResNet18, GoogleNet, and
ensemble algorithm. The second objective is to analyze the segmenta­ ResNet50 models, respectively. Therefore, we can conclude that the
tion effect on brain tumor classification in a deep learning paradigm. MajVot algorithm consistently performed well in all folds of the three
Third, the effect of deep layers on training time and accuracy in datasets and consistently improved the overall performance using the
ascending order of convolutional layers was investigated. opinion of multiple models.
For RSM data, the MajVot algorithm showed the highest classifica­ ROI segmented data (RSM) produced the highest classification per­
tion accuracyof99.06 %, and shown2.26 %, 2.77 %, 1.24 %, 3.93 %, and formance. Whereas, the whole image data (WBM) was shown to have the
1.97 % improvement in the accuracy against AlexNet, VGG16, lowest classification accuracy. However, a very slight difference of 0.63
ResNet18, GoogleNet, and ResNet50, respectively. Further, the MajVot % was observed between the highest and lowest classification accuracy.
algorithm showed maximum accuracy of98.63 % for SSM data and Therefore, we can conclude that the DL model has an incredible ability
depicted 3.60 %, 0.74 %, 1.72 %, 1.31 %, and 2.86 % improvement in to extract appropriate features from a whole-brain image without
the accuracy against AlexNet, VGG16, ResNet18, GoogleNet, and specifying the exact features (tumor features). This hypothesis helps us
ResNet50 models, respectively for the same data. The highest accuracy to develop a fully automated brain tumor grading system where, by
of 98.43 % was seen in WBM data using the MajVot algorithm and seen compromising on minor accuracy, significant computational effort of
2.26 %, 2.77 %, 1.24 %, 3.93 %, and 1.97 % improvement in the ac­ segmentation can be reduced.
curacy against AlexNet, VGG16, ResNet18, GoogleNet, and ResNet50, It was also observed that increasing the order of convolutional layers
respectively. Furthermore, the MajVot algorithm showed 2.85 %, 1.39 produced inconsistent effects on accuracy and training time. The

16
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Fig. 15. Effect of deep layers of five CNNs on training time.

Fig. D1. Confusion matrix of five-fold test results of MajVot algorithm on RSM Data.

training time of CNNs is somewhat intuitive and mainly depends on the of images). In order to suppress the overfitting, the following measures
number of filters and intermediate parameters they have. Therefore, the were taken. (1) Data augmentation, is the first and foremost choice of
model ’VGG16′ shows the highest training time as it has the highest researchers to suppress overfitting. In this process, data samples are
parameter size (138 million) as shown in Table 4. However, CNN’s artificially increased using image transformation operations. Image
prediction of layer size-related accuracy is highly unpredictable and can rotation and scaling operations were well adopted in this study as a data
be a good research topic. augmentation operation. (2) Introduced dropout layers in the DL
models. In this concept, some layers of networks are randomly discon­
nected between feed-forward networks so that information flow be­
6.1. Challenges of deep learning models tween the layers is hampered. The selected models have implicitly
included dropout layers in the model architecture. (3) K-fold cross-
DL models are shows maximum performance on a large dataset (i.e. validation, is also reflected well in the data characteristics in terms of
Millions of images).In contrast, the limited medical dataset is a major K-fold performance. In our study, the five-fold cross-validation method
barrier in medical research. However, obtaining quality medical data was adopted and produced quite varied results. It shows that overfitting
involves many difficulties such as high expenses, privacy issues, has been suppressed to a greater extent.
requiring patient consent, etc. Therefore, this bottleneck can be avoided
via a pre-trained network using transfer learning techniques to fine-tune
the model on available limited data. Overfitting another problem
occurred, while the training of DL models on limited data (i.e. thousands

17
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

Fig. D2. Sample training curves of ResNet18 on RSM Data.

6.2. Strength, limitations, and future enhancement due to human intervention. Therefore, we suggest that it may be the
user’s choice to choose a fully automated approach with little compro­
In this study, we compared ROI segmentation and partial segmen­ mise performance or a semi-automated brain tumor grading system with
tation (skull stripping) with whole-brain images for brain tumor classi­ high classification performance. Another contribution of this study is the
fication. Where we have observed a very narrow difference between the proposed MajVot Ensemble algorithm that significantly enhances the
ROI segmentation method with the whole brain image (without seg­ performance of the five DL models for gliomas classification. This
mentation). However, the segmentation approach makes the system technique proved to be better than the single independent multiple DL
quite computationally expensive and requires additional human inter­ model. Furthermore, the study was performed on a single institution
vention. As a result, the tumor grading tool becomes semi-automated dataset. The actual novelty of the proposed method can be tested on

18
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

multi-institutional data or in the actual scenario on the trained model of interests or personal relationships that could have appeared to influence
sufficiently large data (millions of images) of training. The proposed the work reported in this paper.
MajVot was ensembled on five models, this could be extended to any n
number of odd models. The performance of the MajVot algorithm is still Data availability statements
due to the varied number of models and can be a good topic for further
study. The datasets generated during and/or analyzed during the current
study are available in the TCIA.
6.3. Benchmarking of the proposed method with existing state-of-the-art (REMBRANDT)repository, https://wiki.cancerimagingarchive.
methods net/display/Public/REMBRANDT.

The proposed MajVot algorithm has been applied to three types of Appendix. -A
segmented datasets and compared with existing state-of-the-art methods
in Table 10. The proposed MajVot algorithm on the above three datasets See Tables A1-A6.
has shown excellent performance compared to the existing methods. A
detailed discussion of existing methods is given in Literature Review Appendix. -B
Section 2.
See Tables B1-B5.
7. Conclusion
Appendix. -C
This study is proposed for MRI-based brain tumor classification.
Three categories of segmentation methods such as ROI segmentation,
See Tables C1-C6.
partial segmentation (skull separation), and whole-brain image (without
segmentation) were compared. The above dataset was tested on five
deep learning models. The performance of the five deep learning models Appendix. –D
was further optimized using a majority voting-based ensemble algo­
rithm (MajVot). The proposed majority voting algorithm demonstrated See Figs. D1-D2.
highly optimized classification for the above datasets and performed
consistently across each fold of the data. The MajVot algorithm showed References
2.85 %, 1.39 %, 1.26 %, 2.66 %, and 2.33 % improvement in the average
[1] WHO Statistics of Brain Cancer, (n.d.). http://www.who.int/cancer/en/ (accessed
accuracy of the above three datasets over the AlexNet, VGG16, May 15, 2021).
ResNet18, GoogleNet, and ResNet50 models, respectively. Using the [2] A. Perry, P. Wesseling, Histologic classification of gliomas, 2016, pp. 71–95. 10
MaVotalgorithm, the highest classification accuracy (99.06 %) was .1016/B978-0-12-802997-8.00005-0.
[3] M.N. Gurcan, L.E. Boucheron, A. Can, A. Madabhushi, N.M. Rajpoot, B. Yener,
observed in the ROI segmented MRI data and the least accuracy (98.43 Histopathological Image Analysis: A Review, IEEE Rev. Biomed. Eng. 2 (2009)
%) was observed in the whole brain MRI data. However, only a 0.63 % 147–171, https://doi.org/10.1109/RBME.2009.2034865.
accuracy improvement was observed in the ROI segmentation data [4] D.N. Louis, A. Perry, G. Reifenberger, A. von Deimling, D. Figarella-Branger, W.K.
Cavenee, H. Ohgaki, O.D. Wiestler, P. Kleihues, D.W. Ellison, The 2016 World
against the whole brain image data. We know that any segmentation Health Organization Classification of Tumors of the Central Nervous System: a
approach makes the system computationally expensive and requires summary, Acta Neuropathol., 131 (2016) 803–820. Doi: 10.1007/s00401-016-1
additional human intervention. As a result, the tumor grading tool be­ 545-1.
[5] S. Lapointe, A. Perry, N.A. Butowski, Primary brain tumours in adults, The Lancet.
comes semi-automated due to human intervention. On the other hand, 392 (10145) (2018) 432–446, https://doi.org/10.1016/S0140-6736(18)30990-5.
whole-brain image data is designed to be a fully automated tumor [6] C. Ge, I.-H. Gu, A.S. Jakola, J. Yang, Deep semi-supervised learning for brain tumor
grading tool with few compromises in performance. Therefore, it may be classification, BMC Med. Imaging 20 (1) (2020), https://doi.org/10.1186/s12880-
020-00485-0.
the user’s choice to choose a fully automated approach, with little
[7] D.W. Parsons, Siaˆn Jones, X. Zhang, J.-H. Lin, R.J. Leary, P. Angenendt, P. Mankoo,
compromise performance, or to choose a semi-automated brain tumor H. Carter, I.-M. Siu, G.L. Gallia, A. Olivi, R. McLendon, B.A. Rasheed, S. Keir,
grading system with high classification performance. Another observa­ T. Nikolskaya, Y. Nikolsky, D.A. Busam, H. Tekleab, L.A. Diaz, J. Hartigan, D.
tion made by these experiments was that the five DL models showed R. Smith, R.L. Strausberg, S.K.N. Marie, S.M.O. Shinjo, H. Yan, G.J. Riggins, D.
D. Bigner, R. Karchin, N. Papadopoulos, G. Parmigiani, B. Vogelstein, V.
inconsistent classification performance with respect to the convolutional E. Velculescu, K.W. Kinzler, An Integrated Genomic Analysis of Human
layers they had. Therefore, we can conclude that the performance of Glioblastoma Multiforme, Science 321 (5897) (2008) 1807–1812.
CNN is not related to the layer depth of the model and may vary ac­ [8] B. Kaminska, B. Czapski, R. Guzik, S. Król, B. Gielniewski, Consequences of IDH1/2
mutations in gliomas and an assessment of inhibitors targeting mutated IDH
cording to the nature of the data. In this scenario, it may be a good idea proteins, Molecules 24 (5) (2019) 968, https://doi.org/10.3390/
to use a majority voting algorithm to make use of the combined potential molecules24050968.
of multiple models. [9] C. Hartmann, B. Hentschel, W. Wick, D. Capper, J. Felsberg, M. Simon,
M. Westphal, G. Schackert, R. Meyermann, T. Pietsch, G. Reifenberger, M. Weller,
M. Loeffler, A. von Deimling, Patients with IDH1 wild type anaplastic astrocytomas
Funding exhibit worse prognosis than IDH1-mutated glioblastomas, and IDH1 mutation
status accounts for the unfavorable prognostic effect of higher age: Implications for
classification of gliomas, Acta Neuropathol. 120 (6) (2010) 707–718, https://doi.
No financial support has been received from any institution for this org/10.1007/s00401-010-0781-z.
research work. [10] S.A. Abdelaziz Ismael, A. Mohammed, H. Hefny, An enhanced deep learning
approach for brain cancer MRI images classification using residual networks, Artif.
Intell. Med. 102 (2020) 101779, https://doi.org/10.1016/j.artmed.2019.101779.
CRediT authorship contribution statement
[11] E.I. Zacharaki, S. Wang, S. Chawla, D. Soo Yoo, R. Wolf, E.R. Melhem,
C. Davatzikos, Classification of brain tumor type and grade using MRI texture and
Gopal S. Tandel: Conceptualization, Investigation, Methodology, shape in a machine learning scheme, Magn. Reson. Med. 62 (6) (2009) 1609–1618,
https://doi.org/10.1002/mrm.22147.
Software, Writing – original draft. Ashish Tiwari: Validation, Supervi­
[12] V.P. Gladis Pushpa Rathi, S. Palani, Brain tumor detection and classification using
sion. O.G. Kakde: Validation, Supervision. deep learning classifier on MRI images, Research Journal of Applied Sciences, Eng.
Technol. 10 (2015).
Declaration of Competing Interest [13] H.B. Nandpuru, S.S. Salankar, V.R. Bora, MRI brain cancer classification using
support vector machine, in: 2014 IEEE Students’ Conference on Electrical,
Electronics and Computer Science, SCEECS 2014 (2014) 1–6. Doi: 10.1109/
The authors declare that they have no known competing financial SCEECS.2014.6804439.

19
G.S. Tandel et al. Biomedical Signal Processing and Control 78 (2022) 104018

[14] G. Palareti, C. Legnani, B. Cosmi, E. Antonucci, N. Erba, D. Poli, S. Testa, [31] S. Deepak, P.M. Ameer, Brain tumor classification using deep CNN features via
A. Tosetto, Comparison between different D-Dimer cutoff values to assess the transfer learning, Comput. Biol. Med. 111 (2019) 103345, https://doi.org/
individual risk of recurrent venous thromboembolism: analysis of results obtained 10.1016/j.compbiomed.2019.103345.
in the DULCIS study, Int. J. Lab. Hematol. 38 (2016) 42–49, https://doi.org/ ́
[32] M.M. Badža, M. Markoˇ, M. Barjaktarovicbarjaktarovi ́ Classification of Brain
c,
10.1111/ijlh.12426. Tumors from MRI Images Using a Convolutional Neural Network, (n.d.). Doi:
[15] G.S. Tandel, M. Biswas, O.G. Kakde, A. Tiwari, H.S. Suri, M. Turk, J. Laird, 10.3390/app10061999.
C. Asare, A.A. Ankrah, N.N. Khanna, B.K. Madhusudhan, L. Saba, J.S. Suri, [33] Z. Zhu, S. Lu, S.-H. Wang, J.M. Gorriz, Y.-D. Zhang, DSNN: A DenseNet-Based SNN
A review on a deep learning perspective in brain cancer classification, Cancers for Explainable Brain Disease Classification, Front. Syst. Neurosci. 16 (2022),
(Basel). 11 (1) (2019) 111, https://doi.org/10.3390/cancers11010111. https://doi.org/10.3389/fnsys.2022.838822.
[16] M.I. Sharif, J.P. Li, M.A. Khan, M.A. Saleem, Active deep neural network features [34] B.J. Erickson, P. Korfiatis, Z. Akkus, T.L. Kline, Machine Learning for Medical
selection for segmentation and recognition of brain tumors using MRI images, Imaging, RadioGraphics. 37 (2) (2017) 505–515, https://doi.org/10.1148/
Pattern Recogn. Lett. 129 (2020) 181–189, https://doi.org/10.1016/j. rg.2017160130.
patrec.2019.11.019. [35] D.W. Scarpace, Lisa, Flanders, Adam E., Jain, Rajan, Mikkelsen, Tom, & Andrews,
[17] N. Hu, R. Richards, R. Jensen, Role of chromosomal 1p/19q co-deletion on the Public Data (REMBRANDT), 2015. Doi: doi.org/10.7937/K9/
prognosis of oligodendrogliomas: A systematic review and meta-analysis, TCIA.2015.588OZUZB.
Interdisciplinary, Neurosurgery. 5 (2016) 58–63, https://doi.org/10.1016/j. [36] K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore, S. Phillips,
inat.2016.06.008. D. Maffitt, M. Pringle, L. Tarbox, F. Prior, The Cancer Imaging Archive (TCIA):
[18] E.I. Papageorgiou, P.P. Spyridonos, D.T. Glotsos, C.D. Stylios, Brain tumor Maintaining and Operating a Public Information Repository, J. Digit. Imaging 26
characterization using the soft computing technique of fuzzy cognitive maps, 8 (6) (2013) 1045–1057, https://doi.org/10.1007/s10278-013-9622-7.
(2008) 820–828. Doi: 10.1016/j.asoc.2007.06.006. [37] D.W. Shattuck, R.M. Leahy, (2002). BrainSuite: an automated cortical su, 6(2),
[19] D. Alis, O. Bagcilar, Y.D. Senli, C. Isler, M. Yergin, N. Kocer, C. Islak, O. Kizilkilic, 129-142., BrainSuite, (2019).
The diagnostic value of quantitative texture analysis of conventional MRI [38] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436–444,
sequences using artificial neural networks in grading gliomas, Clin. Radiol. 75 (5) https://doi.org/10.1038/nature14539.
(2020) 351–357, https://doi.org/10.1016/j.crad.2019.12.008. [39] A. Rehman, S. Ahmad, C. Bukhari, Multimodal Brain Tumor Classification Using
[20] A. Ditmer, B. Zhang, T. Shujaat, A. Pavlina, N. Luibrand, M. Gaskill-Shipley, Deep Learning and Robust Feature Selection : A Machine Learning Application for
A. Vagal, Diagnostic accuracy of MRI texture analysis for grading gliomas, Radiologists, (2020) 1–19.
J. Neurooncol. 140 (3) (2018) 583–589, https://doi.org/10.1007/s11060-018- [40] S. Nalawade, G.K. Murugesan, M. Vejdani-Jahromi, R.A. Fisicaro, C.G. Bangalore
2984-4. Yogananda, B. Wagner, B. Mickey, E. Maher, M.C. Pinho, B. Fei, A.
[21] T. Zhan, P. Feng, X. Hong, Z. Lu, L. Xiao, Y. Zhang, E.J. Ciaccio, F. Liu, An J. Madhuranthakam, J.A. Maldjian, Classification of brain tumor isocitrate
automatic glioma grading method based on multi-feature extraction and fusion, dehydrogenase status using MRI and deep learning, Journal of Medical, Imaging. 6
Technol. Health Care 25 (2017) 377–385, https://doi.org/10.3233/THC-171341. (04) (2019) 1, https://doi.org/10.1117/1.JMI.6.4.046003.
[22] K. Skogen, A. Schulz, J.B. Dormagen, B. Ganeshan, E. Helseth, A. Server, Diagnostic [41] P. Bulla, L. Anantha, S. Peram, Deep Neural Networks with Transfer Learning
performance of texture analysis on MRI in grading cerebral gliomas, Eur. J. Radiol. Model for Brain Tumors Classification, Traitement Du Signal. 37 (4) (2020)
85 (4) (2016) 824–829, https://doi.org/10.1016/j.ejrad.2016.01.013. 593–601, https://doi.org/10.18280/ts.370407.
[23] P. Korfiatis, B. Erickson, Deep learning can see the unseeable: predicting molecular [42] R.C. Suganthe, G. Revathi, S. Monisha, R. Pavithran, Deep learning based brain
markers from MRI of brain gliomas, Clin. Radiol. 74 (5) (2019) 367–373, https:// tumor classification using magnetic resonance imaging, J. Crit. Rev. 7 (2020),
doi.org/10.1016/j.crad.2019.01.028. https://doi.org/10.31838/jcr.07.09.74.
[24] G.S. Tandel, A. Balestrieri, T. Jujaray, N.N. Khanna, L. Saba, J.S. Suri, Multiclass [43] A.M. Sarhan, Brain Tumor Classification in Magnetic Resonance Images Using
magnetic resonance imaging brain tumor classification using artificial intelligence Deep Learning and Wavelet Transform, J. Biomed. Sci. Eng. 13 (06) (2020)
paradigm, Comput. Biol. Med. 122 (2020) 103804, https://doi.org/10.1016/j. 102–112, https://doi.org/10.4236/jbise.2020.136010.
compbiomed.2020.103804. [44] Z.N.K. Swati, Q. Zhao, M. Kabir, F. Ali, Z. Ali, S. Ahmed, J. Lu, Brain tumor
[25] Y. Yang, L. Yan, X. Zhang, Y. Han, H. Nan, Y. Hu, B. Hu, Glioma Grading on classification for MR images using transfer learning and fine-tuning, Comput. Med.
Conventional MR Images: A Deep Learning Study With Transfer LearningYang, Imaging Graph. 75 (2019) 34–46, https://doi.org/10.1016/j.
Yang, Lin-feng Yan, Xin Zhang, Yu Han, Hai-yan Nan, Yu-chuan Hu, and Bo Hu. compmedimag.2019.05.001.
2018. “Glioma Grading on Conventional MR Images: A Deep Learning Study With [45] S.J. Pan, Q. Yang, A Survey on Transfer Learning, IEEE Trans. Knowl. Data Eng. 22
Transfer, 12 (2018) 1–10. Doi: 10.3389/fnins.2018.00804. (10) (2010) 1345–1359, https://doi.org/10.1109/TKDE.2009.191.
[26] S. Banerjee, S. Mitra, F. Masulli, S. Rovetta, Deep Radiomics for Brain Tumor [46] M.E. Taylor, P. Stone, Transfer Learning for Reinforcement Learning Domains: A
Detection and Classification from Multi-Sequence MRI, ArXiv. (2019) 1–15. htt Survey, 2009.
p://arxiv.org/abs/1903.09240. [47] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A.
[27] M.A. Khan, I. Ashraf, M. Alhaisoni, R. Damaševičius, R. Scherer, A. Rehman, S.A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, ImageNet Large Scale
C. Bukhari, Multimodal Brain Tumor Classification Using Deep Learning and Visual Recognition Challenge, (2014). http://arxiv.org/abs/1409.0575.
Robust Feature Selection: A Machine Learning Application for Radiologists, [48] A. Krizhevsky, I. Sutskever, G.E. Hinton, 2012 AlexNet, Advances In Neural
Diagnostics (Basel) 10 (8) (2020) 565, https://doi.org/10.3390/ Information Processing Systems. (2012) 1–9. Doi: https://doi.org/10.1016/j.
diagnostics10080565. protcy.2014.09.007.
[28] S. Khawaldeh, U. Pervaiz, A. Rafiq, R. Alkhawaldeh, Noninvasive Grading of [49] K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale
Glioma Tumor Using Magnetic Resonance Imaging with Convolutional Neural Image Recognition, (2014).
Networks, Appl. Sci. 8 (2017) 27, https://doi.org/10.3390/app8010027. [50] C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V.
[29] H.A. Khan, W. Jue, M. Mushtaq, M.U. Mushtaq, Brain tumor classification in MRI Vanhoucke, A. Rabinovich, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D.
image using convolutional neural network, Math. Biosci. Eng. 17 (2020) Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: 2015
6203–6216, https://doi.org/10.3934/MBE.2020328. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2015:
[30] A. Kabir Anaraki, M. Ayati, F. Kazemi, Magnetic resonance imaging-based brain pp. 1–9. Doi: 10.1109/CVPR.2015.7298594.
tumor grades classification and grading via convolutional neural networks and [51] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, n.d.
genetic algorithms, Biocybern. Biomed. Eng. 39 (1) (2019) 63–74, https://doi.org/ [52] G.S. Tandel, A. Tiwari, O.G. Kakde, Performance optimisation of deep learning
10.1016/j.bbe.2018.10.004. models using majority voting algorithm for brain tumour classification, Comput.
Biol. Med. 135 (2021) 104564, https://doi.org/10.1016/j.
compbiomed.2021.104564.

20

You might also like