Analyzing Activation Functions With Transfer Learning-Based Layer Customization For Improved Brain Tumor Classification
Analyzing Activation Functions With Transfer Learning-Based Layer Customization For Improved Brain Tumor Classification
ABSTRACT Brain tumors pose a significant global health concern, requiring early and accurate detection
for effective treatment. Our study presents a binary brain tumor classification architecture leveraging Deep
Neural Network (DNN) pre-trained models to reduce misclassification rates. We modified five Convolutional
Neural Network (CNN) models using Transfer Learning (TL) and evaluated the effects of seven different
Activation Functions (AF). Our proposed architecture was trained, tested, and validated using the ‘‘Br35H:
Brain Tumor Detection 2020’’ dataset. The results show that our modified DenseNet121 with Swish AF
achieves the best classification performance, with a balanced test accuracy of 99.14% and high scores in
Area Under the Curve (AUC), Cohen’s Kappa, Precision, Recall, F1-Score, and Specificity. The proposed
architecture also demonstrates practical values in improving medical outcomes, enabling radiologists to
focus on complex cases and patient care. It also reduces manual classification time and effort, leading to cost
savings for healthcare facilitators. Our study highlights the potential of DNN in brain tumor classification,
paving the way for advancements in medical imaging and healthcare technology. The proposed architecture
can be adapted for various medical imaging tasks, making it a valuable tool for medical professionals and
contributing to improved patient outcomes and enhanced healthcare efficiency.
INDEX TERMS Activation functions, brain tumor, pre-trained models, GELU, Mish, SELU, Swish.
I. INTRODUCTION from other body regions to the brain [2]. In a given year,
The brain, a vital component of the Central Nervous approximately 90,000 persons in the United States (US)
System (CNS), plays a crucial role in controlling most undergo a primary brain tumor diagnosis, according to the
bodily functions, including decision-making, processing, Central Brain Tumor Registry’s annual report [3], [4]. The
coordinating, and transmitting signals to other body parts [1]. fifth most prevalent type of cancer is brain and other CNS
The brain’s intricate anatomical structure is essential for malignancies [4]. More than a million individuals are dealing
its complex functions. However, various CNS disorders can with a primary brain tumor diagnosis. At present, around
affect the brain, including traumatic brain injury, multiple 28,000 children in US are under brain tumor diagnosis. Every
sclerosis, stroke, developmental abnormalities, and brain year, about 3,400 children (0–14 years old) receive a primary
tumors [1]. Specifically, a brain tumor is an abnormal growth brain tumor diagnosis [5].
of cells within the brain or surrounding tissues, which can Similarly, brain tumors are becoming more common in
disrupt normal brain function. Primary brain tumors develop India [6]. Every year, there are a growing number of brain
in the brain, but secondary or metastatic brain tumors travel tumor cases documented. While brain tumors can affect
individuals of all age groups, certain types may be more
The associate editor coordinating the review of this manuscript and common in specific age ranges. In India, both children
approving it for publication was Marco Giannelli . and adults tend to develop brain tumors. According to the
2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by/4.0/ 168707
S. Panigrahi et al.: Analyzing AF With TL-Based Layer Customization
International Association of Cancer Registries (IARC), over PReLU, ReLU, and Tanh. We compared the results using
28,000 cases of brain tumors are recorded each year in the benchmark brain MRI dataset ‘‘Br35H: Brain Tumor
India [7]. Hence, early and precise detection of brain tumors Detection 2020’’ [12] retrieved from the Kaggle data
and diagnosis becomes essential to ensure effective treat- repository.
ment. Non-invasive imaging techniques, including Computed • More precisely, we designed and validated an efficient
Tomography (CT), Positron Emission Tomography (PET), modified DenseNet121 CNN architecture, leveraging
and Magnetic Resonance Imaging (MRI), are regarded as Swish AF and fine-tuned TL, achieving State-Of-The-
quicker and safer ways to diagnose brain tumors as compared Art performance in brain abnormality classification.
to invasive diagnostic methods, such as biopsies or surgical
Our paper is organized as follows: Section II summarizes
exploration. Among these, MRI scans are considered the
the recent related studies. In Section III, we have described
gold standard for brain tumor imaging as they provide
the proposed model’s architecture, different techniques
detailed information about brain tumor location, growth,
used, including the details of AF used. Section IV, give
form, and size in 2D and 3D formats with excellent
details of experimental setup, description about the dataset,
resolution [8]. Several automatic classification models using
including the hyperparameter settings. Section V presents
Machine Learning (ML) and Deep Learning (DL) techniques
the performance of the proposed model in terms of various
have been developed to detect brain tumors in this con-
evaluation metrics such as balanced test accuracies, confusion
text. Convolutional Neural Network (CNN), a famous DL
matrices, and classification reports, including comparison
architecture, shows potential performance improvement in
with existing work. Finally, Section VI offers the concluding
medical image classification tasks [9]. Transfer Learning
remarks and some directions for future research.
(TL) approaches are also extensively used to improve the
accuracy of brain tumor classification [10]. Even with
these improvements, the efficacy and acceptance of current II. RELATED WORKS
approaches in clinical settings are severely limited. These Recently, the identification of brain tumors has made use
challenges include the inability to detect early-stage tumors of DL methods. CNN is a primarily used DL technique
accurately, the high variability in tumor appearances among for image processing applications [13]. The CNN produces
patients, the restricted access to sufficient and diverse data a hierarchical feature representation, which can be trained
for DL architectures, and the ‘black box’ nature of many in several phases. Several writers have proposed a wide
DL solutions [11]. The last point presents an exceptionally range of DL approaches for medical image analysis. These
major difficulty. This is because, in clinical applications, studies have investigated the application of DL algorithms
trust-building and turning insights into decisions that can to forecast the occurrence or progression of brain tumors
be practically implemented are highly dependent on trans- accurately [14]. Pashaei et al. used an ensemble technique
parency and interpretability. Our study primarily drives into of CNN and Kernel Extreme Learning Machines (KELM)
the pressing need to increase patient survival rates, improve on the Contrast-Enhanced MRI (CEMRI) image dataset to
brain tumor detection and diagnostics, and overcome the get an accuracy of 93.68% [15]. Ari Ali et al. used an
above-mentioned challenges. The requirement also results Extreme Learning Machine Local Receptive Fields (ELM-
from a dearth of thorough research into the many Activation LRF) based on tumor classification techniques on cranial
Functions (AF) that can be applied to improve the accuracy MR images to get an accuracy of 97.18% [16]. Das et al.
and efficiency of brain tumor classification. In this regard, have developed an enhanced CNN model for brain tumor
we have considered the seven primarily used AF, described classification on the CEMRI dataset containing 3064 images
in detail in Subsection III-B below. Including all these AF, to get a test accuracy of 94.39% [17]. Ullah et al. have
we have modified five pre-trained CNN models described used several image enhancement techniques, such as noise
in detail in Subsection III-C with our suggested layers removal using a median filter, CE using the Histogram
to improve the efficacy and performance of binary brain Equalization Technique (HET), and the techniques of DNN
tumor classification. Moreover, each AF has its specific on brain MRI images dataset to get an accuracy of 95.8%
advantages. It is helpful when combined with our modified [18]. Mzoughi et al. proposed a 3-dimensional CNN to clas-
pre-trained CNN architectures in classifying brain MRI sify High-Grade Gliomas (HGG) and Low-Grade Gliomas
images, ultimately reducing misclassification and saving (LGG) types of glioma tumors from the BraTS dataset and
patient life. The significant contributions of our work are achieved 96.49% accuracy [19]. Huang et al. proposed a
summarized below: CNNBCN-based model including modified AF to classify
• We proposed five modified pre-trained models, namely MR images. The model achieves an accuracy of 95.49%
ResNet50V2, InceptionV3, NasNetMobile, Xception, with only a few steps of preprocessing [20]. Deepak et al.
and DenseNet121 by using the TL technique for introduced a CNN-SVM framework that can classify brain
feature extraction to improve the effectiveness of binary tumors into three classes. The proposed system achieved a
classification of brain tumor MRI images. 95.82% accuracy [21]. Naseer et al. proposed a CNN-based
• Next, we analyzed our modified models separately with computer-aided brain tumor diagnosis system including
seven diverse AF namely, Swish, Mish, SELU, GELU, preprocessing and feature reduction technique to achieve an
accuracy of 98.80% [22]. Arman et al. aims to efficiently to achieve our tasks, such as preprocessing, enhancements,
recognize brain tumors from the other brain tissues, including training, and evaluation, are shown in Fig. 1 below.
grey and white matter and Cerebrospinal Fluid (CSF). They
achieved an overall accuracy of 94.8% [23]. Divya Shree B. ACTIVATION FUNCTIONS BASED NETWORK TRAINING
evaluates the performance of a hybrid DCCN architecture AND CLASSIFICATION
DenseNet169 with image enhancement methods to achieve The act of fine-tuning an NN’s weights and biases so that
an accuracy of 93.29% [24]. Vankdothu et al. proposed it can learn from input data and classify objects accurately
RCNN to classify MR images as tumors and non-tumors. is known as network training. In the case of supervised
This method used K-means clustering and the Grey-Level learning, an NN is trained on a labeled dataset known
Co-occurrence Matrix (GLCM) for feature extraction. With as the training dataset, which has matching target labels
all these tweaks, the proposed method can achieve a high along with the input data [30], [31]. By calculating the
accuracy of 95.17% [25]. Mahesha et al. suggested a CNN gradients of the network’s parameters concerning the training
architecture to classify the brain tumor and achieve an data and updating the parameters to minimize the loss
accuracy of 97.2% [26]. Gómez-Guzmán et al. analyzed function, the backpropagation optimization process can help
different CNN pre-trained architecture, and InceptionV3 achieve the goal of network training [30]. The goal is to
obtained the best accuracy of 97.12% [27]. Ata et al. aimed minimize the difference between the predicted output and
to create a feature fusion model with more comprehensive target labels.
textual feature extraction by using GLCM. They designed
a concatenation-based model to achieve an accuracy value 1) NEURAL NETWORK OPERATION
of 98.22% [28]. Bhardwaj et al. proposed a fine-tuned Input: The input data is fed through the NN during the
VGG16 architecture and got an accuracy of 97.00% [29]. Our forward pass, and each training example’s prediction is
study is guided by these literature reviews, which discussed computed [32]. For a single training example, the NN output
the advantages and disadvantages of recent research. More is denoted as ŷ; it is computed as follows:
considerably, more evenly distributed datasets are thought
to enhance model performance and reinforce generalization ŷ = f (x, θ) (1)
capacity. Furthermore, it is considered that better modeling where,
strategies can improve the model’s capacity for generaliza- x is the input feature.
tion. The constraints mentioned above point to a gap that θ is the model’s parameters (weights and biases).
requires attention. In the next section, we showcase our Weights are used to scale the input features, and biases are
experimental work and techniques, taking inspiration from used to shift the AF.
these constraints. The operation of an NN involves several layers of neurons,
each performing linear transformations followed by nonlinear
AF. For a single neuron in a given layer, l:
III. METHODOLOGY
This section describes a concise detail of our proposed Z (l) = W (l) a(l−1) + b(l) (2)
architecture, various AF used with the proposed layers, and
the Neural Network (NN) training process, including input where,
and output operations with their specific equations. W l is the weight metric for layer l.
a(l−1) is activation from the previous layer.
bl is the bias vector for layer l.
A. PROPOSED ARCHITECTURE FOR BRAIN TUMOR Again, AF can be computed as follows:
CLASSIFICATION
a(l) = σ (Z (l) ) (3)
Examination of brain MRI images is challenging because
tumors can span a wide range of sizes, shapes, and locations. where,
Several researchers have introduced various techniques σ is the applied elemental AF.
described above in Section II. They focused on identifying The AF introduces non-linearity into the model, enabling it
anomalies in data that are sometimes not directly observable, to learn more complex patterns. As we are dealing with MRI
and each method has pros and cons. The availability of a images, they contain many complex patterns [33]. To deal
benchmark dataset that can assess the effectiveness of State- with complexity, we have used seven different AF, such
Of-The-Art procedures is critical for the impartial evaluation as Tanh, ReLU, PReLU, SELU, GELU, Mish, and Swish,
of these methods. Brain tumor images produced by different in seven different approaches [34], [35]. The reason behind
devices may vary in sharpness, contrast, number of slices, choosing these seven AF is that they include all varieties
and pixel spacing. In this scenario, we have described our of AF, starting from traditional to modern and advanced
proposed architecture and technical specifications of the AF. Also, testing with various AF allows us to identify
system for binary classification of brain MRI images. The which AF is most effective for our proposed architecture
proposed architecture is based on CNN models, and the steps and dataset. The choice of AF in DNN is vital for learning
FIGURE 1. Visual depiction of our proposed architecture based on pre-trained CNN models with additional layers and seven
different AF.
complex patterns from various data types. The commonly with LeCun normal initialization, making it suitable for deep
used AF in Recurrent Neural Networks (RNN) is Hyperbolic networks by mitigating vanishing and exploding gradient
Tangent (Tanh) AF, which is suitable for scenarios where problems [39]. One more AF is Gaussian Error Linear
zero-centered data is beneficial. It leads to vanishing Unit (GELU) AF, that incorporates a Gaussian distribution
gradient problems and complexity due to exponential cal- for a probabilistic activation by generating a smooth curve
culations [36]. Similarly, the mostly used Rectified Linear that helps maintain gradient flow and effectively captures
Unit (ReLU) AF is simple and computationally efficient but complex patterns [40]. Compared to the above AF, ‘Mish’
suffers from the ‘dying ReLU’ problem [37]. Parametric is one more AF that provides a smooth and non-monotonic
Rectified Linear Unit (PReLU) AF is an extension of the curve like GELU. It is based on a combination of Tanh and
ReLU, which can mitigate the ‘dying ReLU’ problem with natural logarithm functions. It promotes self-regularization
the flexibility of its learnable parameter, making it adaptable by encouraging zero-centered outputs and smooth gradient
for different data distributions [38]. Next, Scaled Exponential flow [41]. Another AF, ‘Swish,’ has shown potential results in
Linear Unit (SELU) AF promotes self-normalizing properties DNN on ImageNet data compared to all the above AF. It uses
FIGURE 2. Graph for (a) Tanh AF (b) ReLU AF (c) PreLU AF (d) SELU AF (e) GELU AF (f) Mish AF (g) Swish AF.
AF might make it necessary to adjust more hyperparameters and raise costs [43], [44]. Sometimes, how these AF will
during training, which would need more computing power perform while dealing with complex data needs to be
clarified. We aim to find an AF to help NN to learn layers. Next, we considered the dropout layer, which is
more accurately and efficiently [45]. Hence, Table. 1 below used for regularization. It prevents overfitting by randomly
describes a brief of each of the seven AF we employed deactivating a fraction of neurons during training [26].
for our brain tumor classification with the suggested Finally, we considered the classification layer with the
layers. sigmoid AF as it is a binary classification task. Furthermore,
Next is a Loss Function, a critical component that measures we only trained the proposed layers while keeping the weights
how well the model’s predictions match the actual data. The of the pre-trained layers frozen. The substantial benefit of
training aims to minimize this loss function by adjusting using this approach is that it works well even with less data,
the model’s parameters to improve its performance [46], leads to faster training, transfers knowledge and learning from
[14]. For our implementation process, we used the binary one domain to another, and, most importantly, avoids the
cross-entropy loss (binary log loss) function, a joint loss problem of overfitting [48]. In this context, we have used
function used in binary classification tasks. It measures the five primarily used pre-trained models such as ResNet50V2
dissimilarity between the predicted probabilities and the [50], InceptionV3 [51], NASNetMobile [52], Xception [53],
actual binary labels. The formula for binary cross-entropy and DenseNet121 [54], with our proposed layers. These are
loss is as follows: well-known DL models trained on massive datasets such
N as ImageNet, the benchmark database with over 14 million
1 X
images across 1,000 classes. These models were constructed
L(y, ŷ) = − yi log(ŷi ) + (1 − yi ) log(1 − ŷi )
N for image classification tasks and are used for TL using
i=1
(19) their weights as they are [55]. This assists in obtaining
meaningful features from images across various tasks, includ-
where, ing medical imaging [47], [56]. The main reason behind
N is the number of samples. choosing these five pre-trained models is that each represents
yi is the actual label for the i-th sample. a variety of State-Of-The-Art DL models with different
ŷi is the predicted probability for the i-th sample. design principles. Like, the ResNet50V2 model is known
If y = 1, the loss term becomes – log(ŷ), penalizing low for its skip connections technique, which minimizes the
predicted probabilities for the positive class. vanishing gradient issue and enables deeper networks [50].
If y = 0, the loss term becomes – log (1 – ŷ), penalizing high InceptionV3 model employs multi-scale feature extraction
predicted probabilities for the positive class. with parallel convolutions [51]. The NasNetMobile model
is designed through Neural Architecture Search (NAS),
C. TRANSFER LEARNING PROCESS APPLICATION ON optimized for mobile devices, and provides insights into the
PRE-TRAINED CNN MODELS performance of lightweight models [52]. The Xception model
TL is a widespread technique in DL that involves using replaces standard convolutions with depthwise separable
pre-trained models as a starting point for a new task or convolutions [53]. The DenseNet121 model introduces dense
problem. The key idea behind TL is that features learned connections where each layer receives input from all previous
by a model from one task can be helpful for related layers, fostering feature reuse and improving gradient
tasks [47], [48]. Typically, there are two main techniques flow [54]. So, considering different CNN architectures,
for TL with pre-trained models: feature extraction and fine- we can explore the effects of our proposed layers as
tuning. For our implementation work, we have used the described in Subsection III-A above with a wide range of
feature extraction approach, in which we freeze the weights design patterns and feature extraction techniques, enabling
of the pre-trained model’s layers and use them as fixed feature a comprehensive comparison of their performance. We have
extractors [49]. We removed the original models’ output pointed out the details of each pre-trained CNN model in the
layer(s) and added the suggested layers specific to our brain Table. 2 below.
tumor classification task. Our modified pre-trained CNN In CNN, optimizers are crucial in training the model and
architectures employed two dense layers with 512 nodes updating its parameters (weights and biases) to minimize the
each, followed by a dropout layer and a classification layer loss function [57]. Out of many available optimizers, which
with sigmoid AF. This configuration was chosen based on have their pros and cons, we have used the Adam (Adaptive
existing relevant literature [19], [20], [21], [22], [23] and Moment Estimation) optimizer for our task [58]. The main
intuition from similar CNN-based models. Dense layers reason for using this optimizer is that it distinguishes
enable the model to learn complex representations and inter- itself from other optimizers through its adaptive learning
actions between features [24]. Considering two dense layers rates for individual parameters, synergistically combining
offers a reasonable trade-off between model complexity the benefits of Momentum and RMSProp. Additionally, its
and computational efficiency. Too many neurons could lead bias correction feature enhances convergence stability during
to overfitting or high computational costs, while too few training, reducing the number of iterations needed to reach
might not be able to capture more relevant features [27], a satisfactory solution [57]. This versatile approach enables
[29]. Hence, based on our dataset size, complexity, and Adam to excel with minimal tuning across diverse tasks [59].
number of classes, we considered 512 neurons for dense The formula for weight and bias updation using the Adam
VOLUME 12, 2024 168713
S. Panigrahi et al.: Analyzing AF With TL-Based Layer Customization
FIGURE 3. Sample of images of dataset (a) and (b) normal brain images,
(c) and (d) tumor brain images. TABLE 4. Hyperparameters setting of the proposed model.
B. DATASET DESCRIPTION
The dataset ‘‘Br35H: Brain Tumor Detection 2020’’ [12]
consists of 3864 images. At the end of the data preprocessing
step, we remain with 2664 images after removing low-quality,
duplicate, or irrelevant images. We choose images with
sufficient resolution, lighting, or clarity. Finally, we randomly
selected 2200 images, keeping resource constraints in mind.
Again, to ensure equal representation of classes to prevent
bias, we used 1100 MRI images of tumor type (labeled 1)and
1100 MRI images of normal type (labeled 0). We did not use
any other specific criteria to select the number of images. Out
of 2200 images, by using train-test-validation of 75:15:10, 2) NORMALIZATION
we used 1650 images for training, 467 for testing, and 83 for In order to scale input features to have a consistent range or
validation from the same dataset described in Table. 3 below. distribution, normalization is a common data preprocessing
technique. Facilitating equal contribution from all input
features to the learning process and accelerating the con-
C. DATA PREPROCESSING vergence of optimization methods can enhance ML and DL
1) RESIZING AND RESCALING models’ performance and training stability [62]. Out of many
A dataset may contain images with varying resolutions or normalization techniques, we have used Min-Max Scaling as
sizes. Here, brain MRI images are of different sizes and it maintains the relative relationships between pixel values,
resolutions, which can reduce accuracy if not appropriately which is essential for image processing tasks [63]. It is also
resized. Various resizing techniques ensure compatibility computationally simpler, faster, and efficient, especially for
without compromising model performance [60]. They can image datasets. Furthermore, it is more suitable for images
be uniform and compatible for further processing by scaling with varying pixel intensity ranges [64]. This technique
them to a standardized range [17]. However, each pre-trained ensures that the pixel values of the images are scaled to a
CNN model has a default input image size, as described in specific range of [0,1] by using the following formula:
the Table. 2 above, we standardized all images to 299X299
x − min(x)
using standard resizing methods, allowing consistent input x′ = (21)
for all models. This technique ensures minimal loss of image max(x) − min(x)
quality while fitting the input size expected by different where,
architectures. Also, Keras (a TensorFlow subsystem) demon- x is the original pixel value.
strates remarkable flexibility when input shapes are adjusted. x ′ normalized pixel value.
While modifying the input shape from the default 224X224 min(x) is minimum pixel value(0).
to 299X299, Keras seamlessly adapts the model architecture max(x) is the maximum pixel value of 255 for 8-bit images.
to accommodate the new dimensions. This adaptability is a The main benefit of using the Min-Max scalar technique is
significant advantage of using Keras frameworks [61]. A few that it ensures all pixel values are on a consistent scale, which
samples of the images are displayed as in Fig. 3 above. helps in convergence during training and prevents large input
FIGURE 5. Confusion matrices for the proposed modified pre-trained CNN architectures with dataset [12] (a) ResNet50V2 model (b) InceptionV3 model
(c) NASNetMobile model (d) Xception model (e) DenseNet121 model using Swish AF.
3) HYPERPARAMETERS SETTING
In CNN, AF and hyperparameters are intrinsically linked
as they collectively influence the learning process and NN
performance [65], [66]. So, including various AF, we have
analyzed various hyperparameters so that they can lead to
better-performing NN by optimizing both the architecture and
learning dynamics of our proposed model. Hyperparameters
are parameters set before training and influence the behavior
and performance of the model [67], [68]. We conducted
extensive hyperparameter tuning using simulation-based FIGURE 6. Comparison of the Cohen’s Kappa Score of pre-trained CNN
with proposed layer models among seven AF.
iterations, evaluating various combinations of parameters.
The optimal configuration, yielding the best performance,
is presented in Table. 4 below. Alternatively, we considered
the binary cross-entropy function out of the many loss
functions available, as our problem is a binary classification
problem. The equation for this is explained in 19 above.
D. EVALUATION METRICS
After the training phase of the model, some standardized
assessment criteria must be done using specific metrics to
gauge the model’s overall performance. These metrics are
known as evaluation metrics, which are described as follows:
Confusion Matrix: A confusion matrix summarizes a DL
model’s performance on a test dataset. Classification models
frequently utilize this approach to assess their accuracy in FIGURE 7. Comparison of test loss of pre-trained CNN models with
suggested layers among various AF.
predicting category labels for input examples [69], [70].
Concerning the test data, the matrix provides a compre-
hensive overview of the model’s True Positives (TP), True P0 − Pe
κ= (29)
Negatives (TN), False Positives (FP), and False Negatives 1 − Pe
(FN). In binary classification, the matrix takes the shape of where,
a 2X2 table. From this Confusion matrix, several evaluation P0 Relative observed agreements among raters (actual
metrics evolve. As per our problem statement, we have agreement).
considered the following measures: Pe Hypothetical probability of chance agreement.
MCC is the Matthews Correlation Coefficient.
TP + TN
Accuracy = (22)
TP + TN + FP + FN
TP V. RESULTS AND DISCUSSION
Precision(P) = (23)
TP + FP In this section, the experimental results are conscientiously
Recall or Sensitivity analyzed. Our work aimed to provide an effective method
TP for classifying and identifying brain tumors utilizing var-
= (24) ious AF with our proposed layers. Our motive was to
TP + FN
TN address the urgent requirement for precise and trustworthy
Specificity = (25) brain tumor detection from brain MRI images to enhance
TN + FP
2·P·R medical diagnosis and formulation of a treatment strategy.
F1-Measure = (26) Our study focused on promoting brain tumor research by
P+R
assessing the performance of the suggested model through
(TP × TN) − (FP × FN)
MCC = √ in-depth analysis and comparison with current methodolo-
(TP+FP)(TP+FN)(TN+FP)(TN+FN) gies. The 2200 MRI scans that comprised the dataset utilized
(27) in our study fell into one of two categories: tumor and no
Balanced Test Accuracy tumor. A 75:15:10 ratio has been used to divide the dataset
Sensitivity + Specificity into training, testing, and validation. Here, validation data
= (28) refers to a holdout validation approach as we have used
2
VOLUME 12, 2024 168717
S. Panigrahi et al.: Analyzing AF With TL-Based Layer Customization
TABLE 5. Comparison of the Balanced Test Accuracy of the CNN models with proposed layers.
FIGURE 8. Training Validation Accuracy / Loss curve of the Proposed model using Swish AF (a) modified ResNet50V2 (b) modified
InceptionV3 (c) modified NASNetMobile( d) modified Xception (e) modified DenseNet121.
10% of the data as a fixed validation set for tuning and a dropout layer, and a classification layer with sigmoid AF.
15% as a fixed test set for final evaluation. Additionally, our A total of 35 experiments have been conducted and found
model has been trained on a portion of data and tested on that modified DenseNet121 CNN architecture with proposed
a different portion of the same dataset. Hence, the testing layers and Swish AF provides the best-balanced test accuracy.
was intra-subject. The input MRI images have been resized The following sub-section compares the confusion matrix
and normalized. Then, the images were fed for training results described in IV-D. Additionally, we compared the
into our modified pre-trained CNN architectures with four results of all evaluation metrics of different CNN pre-trained
suggested layers, such as two dense layers of 512 nodes each, architectures considered for our classification approach in the
FIGURE 9. AUC Score of different CNN architectures, each with (a) Tanh AF (b) ReLU AF (c) PreLU AF (d) GELU AF (e) SELU AF (f) Mish AF
(g) Swish AF.
TABLE 6. Details of various evaluation metrics for our proposed layers with Swish AF on different CNN architectures.
FIGURE 10. Different evaluation metrics comparison based on seven AFs on the modified DenseNet121 model.
comparative analysis section. Also, we compare the results diagnoses. Cohen’s Kappa is another metric that may help
with those of the existing approach. quantify the level of agreement between different radiolo-
gists’ interpretations, ensuring consistency and reliability in
A. COMPARATIVE ANALYSIS diagnoses [71], [72], [73]. Cohen’s Kappa is a statistical
In our study, we have explored the classification of brain measure of inter-rater agreement, described in 29 above, and
tumors using various State-Of-The-Art pre-trained model Fig. 6 below compares this metric with our proposed model.
architectures, including ResNet50V2, InceptionV3, NasNet- Besides the metrics mentioned above, the test loss metric
Mobile, Xception, and DenseNet121 models with proposed also plays a significant role as it measures how effectively
layers such as two dense layers followed by a dropout the proposed model generalizes to unseen data and how
layer and classification layer. We have used the ‘‘Br35H: well it will perform in real-world scenarios. We have used
Brain Tumor Detection 2020’’ [12] dataset. During the binary cross-entropy loss as our model concerns binary
classification process, we tested seven different AF, such classification tasks, as described in Eq 19 above. Evaluating
as Swish, Mish, SELU, GELU, PReLU, ReLU, and Tanh, test loss helps ensure the model is robust to variation in
separately with our proposed layers. We presented all the brain MRI images. It provides a reliable metric for optimizing
results in graphical and tabular forms below. In our case, the hyperparameters and validating the proposed model.
Swish AF is outperforming all other AF used, so confusion A detailed comparison of test loss of various proposed CNN
matrices for our proposed modified pre-trained CNN models architectures with different AF is depicted in Fig. 7 below.
using Swish AF are provided in Fig. 5 below. In CNNs, the model accuracy curve is a graphical depiction
Balanced accuracy is the average of sensitivity and showing how the model’s accuracy changes during training.
specificity, providing a balanced performance measure across It is plotted by tracking the model’s performance (mainly
both classes with the formula as in 28 above. Table. 5 accuracy) on the training and validation datasets over multiple
below compares the balanced accuracy of the brain MRI epochs [69], [74]. As we got the best performance using
images classification dataset using different pre-trained CNN Swish AF, the training-validation accuracy/loss curve for our
architectures with proposed layers. In medical imaging, proposed architecture on the Brain MRI Images is shown in
different radiologists or classifiers may provide different Fig. 8 below.
TABLE 7. Overall performance comparison of the proposed model using Swish, Mish, SELU, GELU, PReLU, ReLU, and Tanh AFs in terms of Precision,
Recall, and F1-Score for the Brain Tumor Classification.
Furthermore, Area Under the Curve (AUC) is another and produces better results than other AF, and the best results
crucial metric for evaluating the effectiveness of the brain are obtained with the modified Densenet121 pre-trained CNN
tumor classification model. It measures the classifier’s model. This is because Swish AF works well with deep CNN,
performance across all possible classification thresholds, and the DenseNet121 model has 121 layers, producing the
providing a single scalar value [75], [76], [77]. Below Fig. 9 best results with respect to various evaluation metrics. The
describes the AUC score of each proposed CNN pre-trained details are shown in Table. 6.
architecture with each of the seven different AF separately. As far as our proposed modified DesneNet121 model is
Out of the seven different AF that have been applied to concerned, it consists of four dense blocks, each containing
our suggested layers, it was found that Swish AF is better multiple convolution layers. Among these dense blocks, the
TABLE 9. Comparison of various evaluation metrics of our proposed architectures with Swish AF and dataset [78].
TABLE 10. Performance comparison of our proposed work with the existing related work based on same dataset [12].
transition layers perform convolution and pooling actions to and MCC, provide an understanding of the model’s effec-
reduce the feature map size and the number of channels. Due tiveness, especially when dealing with MRI image datasets,
to dense connections, it has fewer parameters, as described in which is often the case in medical imaging such as Brain
Table. 2 in detail. It also has the property of feature reuse, Tumor diagnosis in our case.
which improves learning efficiency and produces efficient Our study evaluated various State-Of-The-Art pre-trained
results, as described in Fig. 10 below. models with different AF with our proposed layers for brain
Below Table. 7 is the overall assessment of various tumor classification. Our results demonstrated the efficacy of
evaluation metrics of our proposed architecture for different the proposed modified DensenNet121 architecture with the
pre-trained CNN models. The various evaluation metrics, Swish AF in achieving improved performance as compared to
such as Precision, Recall or Sensitivity, F1-Score, Specificity, rest of the modified pre-trained CNN models with various AF.
FIGURE 11. Confusion matrices for the proposed modified pre-trained CNN architectures with dataset [78](a) ResNet50V2 model (b) InceptionV3 model
(c) NASNetMobile model (d) Xception model (e) DenseNet121 model using Swish AF.
B. COMPREHENSIVE ANALYSIS OF COMPUTATIONAL DenseNet121 model with better accuracy as well as compu-
EFFICIENCY AND CROSS-DATASET PERFORMANCE tational efficiency as compared to other existing models as
In the above Subsection V-A, we discussed the performance described in detail below in Table. 10.
of our proposed model based on various evaluation metrics.
In this Subsection, Our analysis uncovers the inner workings
of our models, exploring their architectural characteris- VI. CONCLUSION AND FUTURE PROSPECT
tics and efficiency metrics. This comprehensive evaluation Our research article’s primary focus was to develop a binary
reveals each model’s structural characteristics, including total classification system for brain abnormalities. We proposed
parameters, model size, and trainable versus non-trainable five modified pre-trained models, ResNet50V2, InceptionV3,
parameters. We also examined efficiency aspects, such as NasNetMobile, Xception, and DenseNet121, using the TL
inference time and training time. This combined perspective technique for feature extraction to improve the effectiveness
provides a thorough understanding of each model’s strengths of binary classification of brain MRI images. Additionally,
and weaknesses and facilitates informed decision-making. we considered seven input AF, such as Swish, Mish,
We described the detail in Table. 8 below. From the SELU, GELU, PReLU, ReLU, and Tanh, separately in our
Table. 8, it is evident that our modified DenseNet121 proposed dense layers. We examined the individual effects
model has the lowest number of trainable parameters and of each AF on each of the five modified pre-trained models.
achieves the shortest training and inference time compared We also addressed each AF’s various properties, benefits,
to other CNN architectures. This highlights the superior and issues concerning the different pre-trained CNN models.
computational efficiency of the proposed model which We concluded that our proposed architecture, by utilizing a
ultimately determines the robustness and scalability of our modified DenseNet121 model-based CNN architecture with
proposed approach. These results also demonstrate that a fine-tuned TL approach along with Swish AF, outperformed
our proposed approach offers an optimal balance between all other modified pre-trained CNN models with various AF
performance and efficiency, making it suitable for real- by achieving a balanced test accuracy of 99.14% on the Brain
world applications. To further substantiate the efficacy of Tumor Classification dataset.
our proposed methodology, we conducted a supplementary Moreover, our proposed architecture by using SELU and
evaluation on multiple datasets such as Brain MRI Images GELU AF performs similarly, but Mish and Swish AF
for Brain Tumor Detection [78]. This dataset contains significantly improve the performance. It is important to
600 normal brain MRI images and 600 tumor brain MRI note that our study used a limited number of datasets with
images, yielding a combined total of 1200 MRI images. two classes of brain abnormalities. However, the proposed
We applied a train-test-validation split, using 900 images for architecture with an added feature extraction technique
training, 255 for testing, and 45 for validation. The results can potentially be applied to other State-Of-The-Art DL
based on various evaluation metrics are shown in Table. 9 architectures to classify various brain diseases using MRI
below. Detailed confusion matrices, as shown in Fig. 11 images. Multiclass classification can also be applied to
below, give us a better understanding of the model’s ability different datasets based on our proposed architecture. Also,
to adapt and succeed with various datasets. It also offers we can incorporate various explainability techniques such as
valuable insights into the model’s capacity for generalization Gradient-weighted Class Activation Mapping (GradCAM),
across disparate data distributions. Local Interpretable Model-agnostic Explanations (LIME),
Hence, from Table. 9 it is apparent that even when we and SHapley Additive exPlanations (SHAP) to elucidate
applied our proposed modified denseNet121 model to a our proposed models’ decision-making processes. This may
smaller dataset, it consistently delivers superior performance contribute to developing more transparent and trustworthy AI
among various evaluation metrics. This shows the robustness systems in medical imaging.
and generalizability of our proposed approach across differ-
ent datasets sizes or distributions.
DISCLOSURES
Funding: External funding has not been provided for this
C. ASSESSMENT OF THE PROPOSED MODEL COMPARED research work. Conflict of interest: The authors declare that
TO ESTABLISHED METHODS they have no conflict of interest.
This subsection has associated our proposed architecture with
the ongoing State-Of-The-Art pre-trained model architec-
tures. We compared those recent models associated with the CODE, DATA, AND MATERIALS AVAILABILITY
same dataset [12] that we used for our task. The performance The code used in this study is available from the correspond-
of Huang et al. on MR images has shown 95.49% by using ing author upon reasonable request. The publicly available
CNN based on complex networks technique [20]. Similarly, brain MRI datasets utilized in this research were obtained
various authors have used a variety of approaches and from Kaggle, and can be accessed at https://www.kaggle.
techniques on brain MRI images. However, our proposed com/datasets/ahmedhamada0/brain-tumor-detection/, https://
approach shows the best performance by using a modified www.kaggle.com/datasets/emmanuelkibetl/brain-scans.
[32] A. Kumar, J. Kim, D. Lyndon, M. Fulham, and D. Feng, ‘‘An ensemble Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 8697–8710, doi:
of fine-tuned convolutional neural networks for medical image classifi- 10.1109/CVPR.2018.00907.
cation,’’ IEEE J. Biomed. Health Informat., vol. 21, no. 1, pp. 31–40, [53] F. Chollet, ‘‘Xception: Deep learning with depthwise separable convo-
Jan. 2017, doi: 10.1109/JBHI.2016.2635663. lutions,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
[33] S. M. Anwar, M. Majid, A. Qayyum, M. Awais, M. Alnowami, Jul. 2017, pp. 1800–1807, doi: 10.1109/CVPR.2017.195.
and M. K. Khan, ‘‘Medical image analysis using convolutional neural [54] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger,
networks: A review,’’ J. Med. Syst., vol. 42, no. 11, pp. 1–13, Nov. 2018, ‘‘Densely connected convolutional networks,’’ in Proc. IEEE Conf.
doi: 10.1007/s10916-018-1088-1. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2261–2269, doi:
[34] A. M. Alhassan and W. M. N. W. Zainon, ‘‘Brain tumor classification 10.1109/CVPR.2017.243.
in magnetic resonance image using hard swish-based RELU activation [55] J. Deng, W. Dong, R. Socher, L. -J. Li, K. Li, and L. Fei-Fei,
function-convolutional neural network,’’ Neural Comput. Appl., vol. 33, ‘‘ImageNet: A large-scale hierarchical image database,’’ in Proc. IEEE
no. 15, pp. 9075–9087, 2021, doi: 10.1007/s00521-020-05671-3. Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 248–255, doi:
[35] S. Kanwal, F. Khan, S. Alamri, K. Dashtipur, and M. Gogate, ‘‘COVID- 10.1109/CVPR.2009.5206848.
opt-aiNet: A clinical decision support system for COVID-19 detection,’’ [56] S. Kornblith, J. Shlens, and Q. V. Le, ‘‘Do better ImageNet models
Int. J. Imag. Syst. Technol., vol. 32, no. 2, pp. 444–461, Mar. 2022, doi: transfer better?’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
10.1002/ima.22695. Jun. 2019, pp. 2661–2671, doi: 10.1109/CVPR.2019.00277.
[36] M. M. Lau and K. H. Lim, ‘‘Investigation of activation func- [57] S. Ruder, ‘‘An overview of gradient descent optimization algorithms,’’
tions in deep belief network,’’ in Proc. 2nd Int. Conf. Control 2016, arXiv:1609.04747.
Robot. Eng. (ICCRE), Apr. 2017, pp. 201–206. [Online]. Available: [58] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimization,’’
https://ieeexplore.ieee.org/document/7935070 2014, arXiv:1412.6980.
[37] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification [59] H. N. Fakhouri, S. Alawadi, F. M. Awaysheh, and F. Hamad, ‘‘Novel
with deep convolutional neural networks,’’ Commun. ACM, vol. 60, no. 6, hybrid success history intelligent optimizer with Gaussian transformation:
pp. 84–90, May 2017, doi: 10.1145/3065386. Application in CNN hyperparameter tuning,’’ Cluster Comput., vol. 27,
[38] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Delving deep into rectifiers: no. 3, pp. 3717–3739, Jun. 2024, doi: 10.1007/s10586-023-04161-0.
Surpassing human-level performance on ImageNet classification,’’ in Proc. [60] M. Kim, J. Yun, Y. Cho, K. Shin, R. Jang, H.-J. Bae, and N. Kim, ‘‘Deep
IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1026–1034, doi: learning in medical imaging,’’ Neurospine, vol. 16, no. 4, pp. 657–668,
10.1109/ICCV.2015.123. Dec. 2019, doi: 10.14245/ns.1938396.198.
[39] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, ‘‘Self- [61] Mikulskibartosz Blog. (2023). Understanding the Keras Layer Input
normalizing neural networks,’’ in Proc. Adv. Neural Inf. Process. Shapes. [Online]. Available: https://mikulskibartosz.name/understanding-
Syst., vol. 30, 2017, pp. 1–10. [Online]. Available: https://proceedings. the-keras-layer-input-shapes
neurips.cc/paper-files/paper/2017/hash/5d44ee6f2c3f71b73125876103 [62] J. S. Paul, A. J. Plassard, B. A. Landman, and D. Fabbri, ‘‘Deep learning
c8f6c4-Abstract.html for brain tumor classification,’’ Proc. SPIE, vol. 10137, Mar. 2017,
[40] D. Hendrycks and K. Gimpel, ‘‘Gaussian error linear units (GELUs),’’ Art. no. 1013710, doi: 10.1117/12.2254195.
2016, arXiv:1606.08415. [63] A. Subasi, ‘‘Chapter 2—Data preprocessing,’’ in Practical Machine
[41] D. Misra, ‘‘Mish: A self regularized non-monotonic neural activation Learning for Data Analysis Using Python, A. Subasi, Ed., Cambridge, MA,
function,’’ 2019, arXiv.1908.08681. USA: Academic Press, 2020, pp. 27–89, doi: 10.1016/B978-0-12-821379-
[42] P. Ramachandran, B. Zoph, and Q. V. Le, ‘‘Swish: A self-gated activation 7.00002-3.
function,’’ 2017, arXiv.1710.05941. [64] S. K. Mathivanan, S. Sonaimuthu, S. Murugesan, H. Rajadurai, B. D.
[43] E. C. Too, L. Yujian, P. K. Gadosey, S. Njuki, and F. Essaf, ‘‘Performance Shivahare, and M. A. Shah, ‘‘Employing deep learning and transfer
analysis of nonlinear activation function in convolution neural network for learning for accurate brain tumor detection,’’ Sci. Rep., vol. 14, no. 1,
image classification,’’ Int. J. Comput. Sci. Eng., vol. 21, no. 4, pp. 522–535, p. 7232, Mar. 2024, doi: 10.1038/s41598-024-57970-7.
2020, doi: 10.1504/ijcse.2020.106866. [65] M. A. K. Raiaan, S. Sakib, N. M. Fahad, A. A. Mamun, M. A. Rahman,
[44] R. Siouda, M. Nemissi, and H. Seridi, ‘‘Diverse activation functions based- S. Shatabda, and M. S. H. Mukta, ‘‘A systematic review of hyperparameter
hybrid RBF-ELM neural network for medical classification,’’ Evol. Intell., optimization techniques in convolutional neural networks,’’ Decis. Anal. J.,
vol. 17, no. 2, pp. 829–845, Apr. 2024, doi: 10.1007/s12065-022-00758-3. vol. 11, Jun. 2024, Art. no. 100470, doi: 10.1016/j.dajour.2024.100470.
[45] S. Verma, A. Chug, and A. P. Singh, ‘‘Revisiting activation functions: [66] T. Katona, G. Tóth, M. Petró, and B. Harangi, ‘‘Developing new fully
Empirical evaluation for image understanding and classification,’’ Mul- connected layers for convolutional neural networks with hyperparameter
timedia Tools Appl., vol. 83, no. 6, pp. 18497–18536, Jul. 2023, doi: optimization for improved multi-label image classification,’’ Mathematics,
10.1007/s11042-023-16159-2. vol. 12, no. 6, p. 806, Mar. 2024, doi: 10.3390/math12060806.
[46] R. H. K. Emanuel, P. D. Docherty, H. Lunt, and K. Möller, ‘‘The [67] M. Wojciuk, Z. Swiderska-Chadaj, K. Siwek, and A. Gertych, ‘‘Improving
effect of activation functions on accuracy, convergence speed, and classification accuracy of fine-tuned CNN models: Impact of hyperparam-
misclassification confidence in CNN text classification: A comprehensive eter optimization,’’ Heliyon, vol. 10, no. 5, Mar. 2024, Art. no. e26586, doi:
exploration,’’ J. Supercomput., vol. 80, no. 1, pp. 292–312, Jan. 2024, doi: 10.1016/j.heliyon.2024.e26586.
10.1007/s11227-023-05441-7. [68] H. N. T. K. Kaldera, S. R. Gunasekara, and M. B. Dissanayake, ‘‘Brain
[47] S. J. Pan and Q. Yang, ‘‘A survey on transfer learning,’’ IEEE Trans. tumor classification and segmentation using faster R-CNN,’’ in Proc.
Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010, doi: Adv. Sci. Eng. Technol. Int. Conf. (ASET), Mar. 2019, pp. 1–6, doi:
10.1109/TKDE.2009.191. 10.1109/ICASET.2019.8714263.
[48] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, ‘‘How transferable are [69] G. Phillips, H. Teixeira, M. G. Kelly, F. S. Herrero, G. Várbíró,
features in deep neural networks?’’ in Proc. Adv. Neural Inf. Process. Syst. A. L. Solheim, and S. Poikane, ‘‘Setting nutrient boundaries to
(NIPS), vol. 27, 2014, pp. 1–9. protect aquatic communities: The importance of comparing observed
[49] I. S. Rajput, A. Gupta, V. Jain, and S. Tyagi, ‘‘A transfer learning- and predicted classifications using measures derived from a confusion
based brain tumor classification using magnetic resonance images,’’ matrix,’’ Sci. Total Environ., vol. 912, Feb. 2024, Art. no. 168872, doi:
Multimedia Tools Appl., vol. 83, no. 7, pp. 20487–20506, Aug. 2023, doi: 10.1016/j.scitotenv.2023.168872.
10.1007/s11042-023-16143-w. [70] A. J. Larner, The 2×2 Matrix: Contingency, Confusion and the Metrics of
[50] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Identity mappings in deep residual Binary Classification. Berlin, Germany: Springer, 2024.
networks,’’ in Proc. 14th Eur. Conf., Amsterdam, The Netherlands. Cham, [71] E. S. Edgington, ‘‘Estimating the population mean from a one-stage cluster
Switzerland: Springer, Oct. 2016, pp. 630–645, doi: 10.1007/978-3-319- sample,’’ Educ. Psychol. Meas., vol. 33, no. 3, pp. 607–611, Oct. 1973, doi:
46493-0_38. 10.1177/001316447303300308.
[51] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ‘‘Rethink- [72] J. Wang, Y. Yang, and B. Xia, ‘‘A simplified Cohen’s Kappa for use
ing the inception architecture for computer vision,’’ in Proc. IEEE in binary classification data annotation tasks,’’ IEEE Access, vol. 7,
Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 2818–2826, doi: pp. 164386–164397, 2019, doi: 10.1109/ACCESS.2019.2953104.
10.1109/CVPR.2016.308. [73] S. Kumar, R. Dhir, and N. Chaurasia, ‘‘Brain tumor detection analysis
[52] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, ‘‘Learning trans- using CNN: A review,’’ in Proc. Int. Conf. Artif. Intell. Smart Syst. (ICAIS),
ferable architectures for scalable image recognition,’’ in Proc. IEEE Mar. 2021, pp. 1061–1067, doi: 10.1109/ICAIS50930.2021.9395920.
[74] Z. A. Sejuti and M. S. Islam, ‘‘An efficient method to classify DIBYA RANJAN DAS ADHIKARY received the
brain tumor using CNN and SVM,’’ in Proc. 2nd Int. Conf. Robot., M.Tech. and Ph.D. degrees from the Depart-
Electr. Signal Process. Techn. (ICREST), Jan. 2021, pp. 644–648, doi: ment of Computer Science and Engineering,
10.1109/ICREST51555.2021.9331060. Birla Institute of Technology, Mesra, Ranchi,
[75] J. A. Hanley and B. J. McNeil, ‘‘The meaning and use of the area under a in 2011 and 2019, respectively. He is currently
receiver operating characteristic (ROC) curve,’’ Radiology, vol. 143, no. 1, an Assistant Professor with the Department of
pp. 29–36, Apr. 1982, doi: 10.1148/radiology.143.1.7063747. Computer Science and Engineering, Institute
[76] W. Xu, J. Dai, Y. S. Hung, and Q. Wang, ‘‘Estimating the area
of Technical Education and Research (ITER),
under a receiver operating characteristic (ROC) curve: Parametric and
Siksha ‘O’ Anusandhan Deemed to be University,
nonparametric ways,’’ Signal Process., vol. 93, no. 11, pp. 3111–3123,
Nov. 2013, doi: 10.1016/j.sigpro.2013.05.010.
Bhubaneswar, Odisha, India. He has published
[77] J. V. Carter, J. Pan, S. N. Rai, and S. Galandiuk, ‘‘ROC-ing more than 20 research papers in different journals and international
along: Evaluation and interpretation of receiver operating characteristic conferences. His research interests include machine learning, deep learning,
curves,’’ Surgery, vol. 159, no. 6, pp. 1638–1645, Jun. 2016, doi: natural language processing, wireless sensor networks, and pattern matching.
10.1016/j.surg.2015.12.029. He is an active reviewer of various journals and conferences.
[78] E. Kibet. (2024). Brain Scans. Kaggle. Accessed: Sep. 28, 2024. [Online].
Available: https://www.kaggle.com/datasets/emmanuelkibetl/brain-scans
[79] J. Kataria and S. P. Panda, ‘‘HybridCSF model for magnetic
resonance image based brain tumor segmentation,’’ Indonesian J.
Electr. Eng. Comput. Sci., vol. 35, no. 3, p. 1845, Sep. 2024, doi:
10.11591/ijeecs.v35.i3.pp1845-1852.
[80] K. Shah, K. Shah, A. Chaudhari, and D. Kothadiya, ‘‘Comprehensive BINOD KUMAR PATTANAYAK (Member, IEEE)
analysis of deep learning models for brain tumor detection from medical received the M.S. degree in computer engineering
imaging,’’ in Data Science and Applications (Lecture Notes in Networks from the NTU Kharkov Polytechnical Institute,
and Systems), vol. 819, S. J. Nanda, R. P. Yadav, A. H. Gandomi, and in 1992, and the Ph.D. degree in computer science
M. Saraswat, Eds., Singapore: Springer, 2024, doi: 10.1007/978-981-99- and engineering from Siksha ‘O’ Anusandhan
7820-5_28. University, Bhubaneswar, India, in 2011. He is
currently a Professor with the Department of
Computer Science and Engineering, Institute of
Technical Education and Research, Siksha ‘O’
Anusandhan Deemed to be University. He has
SOUMYARASHMI PANIGRAHI (Member, visited Build Bright University, Cambodia, and Universite des Mascareignes,
IEEE) received the M.Tech. degree from the Mauritius, as a Visiting Professor. To his credit, there are 243 research
Department of Computer Science and Engineer- publications in journals and conferences of international repute. As many
ing, Institute of Technical Education and Research as 23 research scholars have been awarded with Ph.D. degrees and nine
(ITER), Siksha ‘O’ Anusandhan Deemed to be scholars are continuing their Ph.D. research work under his supervision. His
University, Bhubaneswar, Odisha, India, in 2011, research interests include the Internet of Things (IoT), artificial intelligence,
where she is currently pursuing the Ph.D. degree. big data analytics, and cloud computing. He belongs to the editorial boards
She has published one conference paper, one book of various reputed peer-reviewed international journals. He has edited one
chapter, and a few research articles are under Springer Book. He has acted as the general chair and the program chair in
the review stage. Her research interests include various international conferences.
computer vision, image processing, deep learning, and generative AI.