9
9
9
ABSTRACT
Hyperparameter tuning plays a pivotal role in the accuracy and reliability of
convolutional neural network (CNN) models used in brain tumor diagnosis. These
hyperparameters exert control over various aspects of the neural network,
encompassing feature extraction, spatial resolution, non-linear mapping,
convergence speed, and model complexity. We propose a meticulously refined CNN
hyperparameter model designed to optimize critical parameters, including filter
number and size, stride padding, pooling techniques, activation functions, learning
rate, batch size, and the number of layers. Our approach leverages two publicly
available brain tumor MRI datasets for research purposes. The first dataset comprises
a total of 7,023 human brain images, categorized into four classes: glioma,
meningioma, no tumor, and pituitary. The second dataset contains 253 images
classified as “yes” and “no.” Our approach delivers exceptional results, demonstrating
an average 94.25% precision, recall, and F1-score with 96% accuracy for dataset 1,
while an average 87.5% precision, recall, and F1-score, with accuracy of 88% for
dataset 2. To affirm the robustness of our findings, we perform a comprehensive
comparison with existing techniques, revealing that our method consistently
outperforms these approaches. By systematically fine-tuning these critical
Submitted 6 November 2023 hyperparameters, our model not only enhances its performance but also bolsters its
Accepted 24 January 2024
generalization capabilities. This optimized CNN model provides medical experts
Published 14 March 2024
with a more precise and efficient tool for supporting their decision-making processes
Corresponding author
Ahmad Shaf,
in brain tumor diagnosis.
ahmadshaf@cuisahiwal.edu.pk
Academic editor Subjects Computer Vision, Neural Networks
Giovanni Angiulli Keywords Hyperparameter tuning, Brain tumor diagnosis, Feature extraction, Spatial resolution,
Additional Information and Model complexity, Decision-making processes, Optimization techniques
Declarations can be found on
page 23
DOI 10.7717/peerj-cs.1878
INTRODUCTION
Brain tumors, the leading cause of demise with the lowest survival rate among cancers,
Copyright
2024 Asiri et al. pose challenges in early detection due to their asymmetrical shapes and dispersed borders.
Distributed under Accurate analysis at the initial stage is crucial for precise medical interventions and saving
Creative Commons CC-BY 4.0 lives. Brain tumors manifest as benign (non-cancerous) or malignant (cancerous) types,
How to cite this article Asiri AA, Shaf A, Ali T, Aamir M, Irfan M, Alqahtani S. 2024. Enhancing brain tumor diagnosis: an optimized
CNN hyperparameter model for improved accuracy and reliability. PeerJ Comput. Sci. 10:e1878 DOI 10.7717/peerj-cs.1878
with primary and secondary distinctions based on origin (Siegel, Miller & Jemal, 2015;
Sauer, 2019).
Common types include meningioma, glioma, and pituitary cancer. Meningiomas
originate from the meninges, gliomas from glial cells supporting nerve function, and
pituitary tumors impact various bodily processes (Abiwinanda et al., 2019; Abir, Siraji &
Khulna, 2018). Understanding these types and their characteristics is vital for effective
diagnosis and treatment, supporting healthcare professionals in providing appropriate
care.
Identifying and estimating the duration of brain tumors presents a significant challenge
in medical diagnostics. The datasets used for this purpose comprise images obtained
through various diagnostic techniques, including biopsies, spinal taps, computed
tomography scans, and magnetic resonance imaging. These datasets undergo
segmentation, classification, and feature extraction based on specific requirements. Deep
learning techniques have emerged as highly effective tools in this domain, particularly in
brain tumor detection. Unlike traditional methods focusing on segmenting tumors for
classification and feature extraction, deep learning approaches employ classification
algorithms to identify and categorize brain tumors. Deep learning, a branch of machine
learning and artificial intelligence, mimics how humans acquire knowledge. Deep learning
algorithms handle complex and abstract tasks, surpassing the performance of traditional
linear machine learning systems that are more suited for smaller datasets. By harnessing
the power of deep learning, accurate and efficient brain tumor diagnosis becomes a reality,
potentially revolutionizing the field of medical imaging and enhancing patient care
(Naseer et al., 2020).
Accurate segmentation of brain tumors is crucial for cancer diagnosis, treatment
planning, and outcome evaluation. However, manual segmentation is a laborious, time-
consuming, and challenging task. To overcome these limitations, researchers have
extensively investigated automatic and semi-automatic brain tumor segmentation
methods (Núñez-Martín, Cervera & Pulla, 2017). A generative or discriminative model is
the foundation for automatic and semi-automatic segmentation. These methods are built
upon either generative or discriminative models. The discriminative model relies on image
features to categorize normal and malignant tissues, while the generative model utilizes
probabilistic information obtained from images for brain tumor segmentation.
Classification techniques, such as support vector mechanism (SVM) and random forest,
are commonly employed in discriminative models (Kleesiek, 2014) based on visual features
like local histograms, image textures, and structure tensor eigenvalues. These research
efforts aim to develop efficient and reliable segmentation approaches that can alleviate the
burden of manual segmentation, enabling accurate tumor delineation and facilitating
treatment planning and evaluation (Meier et al., 2014).
Deep learning algorithms are now often used for object identification, classification, and
feature extraction. Mainly, convolutional neural networks are acknowledged as an
outstanding method for semantic picture segmentation, and convolutional neural
networks-based algorithms performed and generated reliable results (Long, Shelhamer &
Darrell, 2015). The most advanced mechanism, convolutional neural network (CNN), can
The remaining sections of the manuscript are structured as follows: related work, which
describes the current advancement and their limitations; methodology, the specifics of our
hyperparameter-based CNN model utilized with two distinct brain tumor datasets,
discussing their characteristics and preprocessing steps; results, which designates the
outcomes of the applied model; and conclusion, which provides a summary of the article
and future directions
METHODOLOGY
This section explains the methodology’s overall structure. Here is a detailed explanation of
every parameter used in the proposed system. Graphical representation of the proposed
work has been illustrated in Fig. 1.
Dataset details
This study utilized the two brain tumor MRI datasets publicly available at Kaggle. The
dataset 1 comprises a total of 7,023 images of the human brain having dimensions of 512
512 and JPG format. It consists of four classes: glioma (1,621), meningioma (1,645), no
tumor (2,000), and pituitary (1,757). The “no tumor” class images were sourced from the
Br35H dataset. Figure 2 illustrates the different categories present in the dataset, including
no tumors, meningioma, pituitary, and glioma. The dataset 2 consists of 253 images with
yes (155) and no (98) classes. To train and evaluate the hyperparameter-tuned model for
brain tumor detection, the datasets were divided into training, validation, and testing sets
for dataset 1 and 80:20 ratio of training and testing for dataset 2. A detailed description of
the dataset can be found in Tables 2 and 3.
Pre-processing
Pre-processing a brain tumor MRI dataset involves several steps to optimize the data for
analysis and modeling. This includes addressing challenges like varying resolutions and
intensity ranges in MRI images. Rescaling the images to a standardized resolution ensures
consistency across the dataset while normalizing intensity values enhances subsequent
algorithms and models. Techniques like rescaling pixel values to a specific range (0, 1) or
using z-score normalization are commonly employed. Aligning the MRI images to a
standard reference frame is essential due to variations in position and orientation. Image
1XS
C¼ IPðXðai =bi ÞÞ (4)
S N
In the training process, the letter S represents the training set sample, while bi represents
the ith sample of the training set with its corresponding label ai . The probability of
classification, denoted as Xðai =bi Þ, is used to minimize the cost C through the stochastic
gradient function. To calculate the weights of each convolutional layer L, the weight of the
convolutional layer L at iteration t is represented by WLt , as depicted in Eq. (5).
Here WLt : weight of the convolutional layer L, VLtþ1 is the updated weight value at
iteration t. The essential component of convolutional neural networks, feature extraction,
is made possible by the convolutional layer. Several filters to extract features are present in
this layer. The resultant value and layer sizes are evaluated by using Eqs. (6) and (7);
correspondingly, here niL is the resultant feature map of the images, r is the function of
activation, yL is the input width and xLi 2 f; zi 2 f is the filter ðf Þ channels.
A convolutional neural network often employs the pooling layer after each
convolutional layer. This layer manages the parameters, which are also in charge of
overfitting. The most widely used pooling layer is max pooling, which performs distinct
functions from other layers like min pooling and average pooling. Equations (8) and (9)
are used to determine the output and size of the pooling layer when x is the output and R is
the pooling region, respectively.
Ploi;j ¼ max (8)
r;s2R
convolutional layer output size Pooling Size
Pooling layer output size ¼ þ1 (9)
stride
num_filters: The number of filters determines the number of feature maps extracted
by the convolutional layers. Higher values of num_filters may allow the model to
capture more complex patterns but can also increase computational requirements.
num_units: The number of units represents the size of the fully connected layers. It
controls the model’s capacity to learn complex relationships in the data. The values
used range from 64 to 256. Higher values of num_units generally allow the model to
capture more intricate patterns, but they can also increase the risk of overfitting if not
balanced appropriately.
Dropout: Dropout is a regularization technique that helps prevent overfitting by
randomly dropping out a fraction of the units during training. A dropout rate of 0.1
or 0.2 is used in the experiments. Higher dropout rates provide more regularization
but can potentially decrease the model’s learning capacity.
Optimizer: The optimizer determines the algorithm used to update the model
weights during training. Two optimizers are used: Stochastic Gradient Descent (SGD)
and Adaptive Moment Estimation (Adam). Adam is an adaptive optimizer that
dynamically adjusts the learning rate, while SGD uses a fixed one. Adam generally
performs well in a wide range of scenarios, but the choice between the two can depend
on the specific problem and dataset.
Accuracy: Accuracy represents the model’s performance on the validation or test set.
It indicates the proportion of correctly classified samples. The accuracy values range
from 0.75 to 0.96, with different hyperparameter settings achieving varying levels of
accuracy. It is important to note that accuracy alone does not provide a
In Fig. 3 the x-axis has the following parameters: num_filters, num_units, dropout,
optimizer. The y-axis shows the accuracy of the neural network. The graph is a line graph
with multiple lines, each representing a different combination of the parameters. In the
evaluation of CNN architectures with varied hyperparameters, namely the number of
filters, units, dropout rate, and optimizer, the obtained accuracies of 0.96 across different
configurations highlight a remarkable consistency. The combinations tested,
encompassing variations in the number of filters (16, 64), units (64, 128, 256), and a
constant dropout rate of 0.2 with the ’Adam’ optimizer, all yield identical accuracies. In
contrast, one specific set of hyperparameters comprising 16 filters, 64 units, a dropout rate
of 0.1, and utilizing the ‘SGD’ optimizer yielded a comparatively lower accuracy of 0.75.
In Fig. 4, on the second dataset, the highest accuracies achieved were 0.9, obtained from
two distinct configurations. The first configuration utilized 16 filters, 64 units, a dropout
rate of 0.2, and employed the ‘SGD’ optimizer. Meanwhile, the second configuration
comprised 32 filters, 64 units, a dropout rate of 0.1, and utilized the Adam optimizer. Both
configurations yielded the same highest accuracy of 0.9, showcasing the effectiveness of
these particular hyperparameter settings on this dataset. The lowest accuracies observed
were both recorded at 0.39, resulting from two separate configurations. The first
configuration involved 64 filters, 128 units, a dropout rate of 0.1, and utilized the Adam
optimizer. Similarly, the second configuration consisted of 64 filters, 64 units, a dropout
rate of 0.2, also employing the Adam optimizer. Both configurations resulted in the same
lowest accuracy, indicating that these specific combinations of hyperparameters and
optimizer choices might not be effectively capturing the essential patterns or features
within this particular dataset.
By comparing the hyperparameter values and their corresponding accuracies, it is
possible to identify trends and gain insights into the effect of different configurations on
the model performance. However, further analysis and experimentation may be required
to draw definitive conclusions about the optimal hyperparameter settings, such as cross-
validation and statistical significance testing. We did not use Bayesian hyperparameter
tuning in our research. Bayesian hyperparameter tuning is a powerful technique, but we
decided to use random search for a few reasons. Simplicity and ease of implementation:
Random search is a simpler and easier-to-implement technique than Bayesian
optimization. This is important because our research aims to provide a practical solution
that can be easily adopted by researchers and practitioners who might have limited
computational resources.
Baseline comparison: We wanted to establish a baseline comparison against which the
performance of our fine-tuned CNN model could be evaluated. By using random search,
we can ensure that the improvements achieved by our proposed approach can be
attributed to the fine-tuning strategy itself, rather than the specific optimization algorithm.
General applicability: Random search is a versatile method that can work effectively
across a wide range of problem domains. We wanted to demonstrate the effectiveness of
our fine-tuned CNN approach in a generalizable manner, showcasing its potential
applicability to various medical image analysis tasks beyond brain tumor diagnosis.
Efficiency and exploration: Random search provides a good balance between exploration
and exploitation of the hyperparameter space. While Bayesian optimization is highly
efficient in exploitation, random search’s exploration-centric nature allowed us to
comprehensively explore the hyperparameter configurations, potentially uncovering
valuable insights.
RESULTS
This section explains the experimental results for the hyperparameters of the fine-tuned
CNN, explained with various evaluation criteria. The proposed model and the predicted
hyperparametric fine-tuned CNN were implemented in Python on a computer system
with a GPU of 6 GB GTX 1060, an 8th generation Core i7, and 16 GB of RAM to calculate
the results of a brain tumor. The following evaluation criteria were used:
Model results
The confusion matrix of the test data for the four-class (dataset 1) and two-class (dataset 2)
classification is shown in Figs. 5 and 6. The test dataset 1 includes four categories:
meningioma, pituitary, no tumor, and glioma while dataset 2 contains yes and no class.
The confusion matrix is a square matrix with dimensions equal to the number of classes. In
this scenario, with four classes, the confusion matrix is a n n matrix; where n is the
number of classes. Each cell in the matrix represents a combination of predicted and actual
class labels. The numbers within the matrix represent the total number of images utilized
for classification. Each entry in the matrix corresponds to the count of images that belong
to a specific actual class (represented by rows) and were predicted to belong to a specific
predicted class (represented by columns).
The model’s ability to generalize new data is measured by the validation and training
accuracy, as shown in Figs. 7 and 8, which is determined using a different dataset not used
during training. The x-axis represents the number of epochs, and the y-axis represents the
accuracy. The graph has two lines, one for training accuracy and one for validation
accuracy. The training accuracy line is a dashed blue line, and the validation accuracy line
is a dotted orange line. The training accuracy starts at around 0.75 and increases steadily to
around 0.98. The validation accuracy starts at around 0.8 and increases steadily to around
0.95. It helps identify overfitting and evaluates the model performance on real-world data.
A model may have overfitted to the training data and not generalized well if training
accuracy is high, but validation accuracy is noticeably lower. Validation Loss denotes the
inconsistency between the predictions of the model and the actual targets in the validation
Figure 6 The confusion matrix generated by the proposed model on the testing data from dataset 2.
Full-size DOI: 10.7717/peerj-cs.1878/fig-6
dataset, as shown in Figs. 9 and 10 for dataset 1 and dataset 2. It acts as a gauge of the
model’s effectiveness using previously unobserved data. It is estimated using a loss
function, like training loss, and the objective is to minimize it to improve the model’s
accuracy on the novel, untried samples.
The ROC curve explains the concession among the true positive (sensitivity) and the
false favorable rates (specificity minus sensitivity) for various categorization thresholds, as
shown in Figs. 11 and 12 for dataset 1 and dataset 2. A point on the curve shows the
model’s performance at each threshold, which represents a different threshold setting. As
the threshold changes, the actual positive rate is drawn against the false positive rate to
form the curve. The AUC value calculates the total effectiveness of the model. The
likelihood that a casually designated positive sample will be graded higher than a negative
sample is represented by this parameter. The range of AUC values is 0 and 1, with higher
values representing improved performance and discrimination capacity.
The statistical information in Tables 4 and 5 summarizes brain tumor classification, and
detection performance. Impressive results are displayed in the table, including an average
precision, recall, and F1-score of 0.94, showing remarkable accuracy in detecting brain
tumors. The accuracy of 0.96 shows the model’s overall performance in accurately
classifying cases of brain tumors. At the same time, the AUC value of 0.99 indicates
outstanding discrimination abilities due to the fine-tuning of hyperparameters of CNN for
dataset 1.
In dataset 2, for the ‘Yes’ class, the precision, recall, and F1 score are all 0.90, indicating a
consistent performance in predicting this class. The Support for ‘Yes’ is 31, implying that
this class appeared 31 times in the dataset. The AUC for ‘Yes’ is also 0.90, indicating good
discrimination ability for this class.
The ‘No’ class shows slightly lower precision, recall, and F1 score at 0.85 but has an
AUC of 0.10, which seems unusually low. Typically, AUC values are between 0.5 and 1.
This is a point of interest, possibly indicating issues with the model’s ability to distinguish
the ‘No’ class correctly. The ‘Average’ row displays the mean values of precision, recall, and
F1 score for both classes, which are all 87.5. The average AUC is 0.5, suggesting good
overall discrimination ability across both classes.
Figure 13 show the accuracy values comparison for different hyperparameters of the
CNN model. The accuracy values occurred against several filters (num_filters), number of
units (num_units), dropout rate (dropout), and optimizer. Two optimizers are used:
Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Adam). Adam is
an adaptive optimizer that dynamically adjusts the learning rate, while SGD uses a fixed
one. The highest accuracy value of 96% shows an occurrence rate of 11.1%, and the lowest
accuracy value is 75, with an occurrence rate of 2.8%. Generally, the accuracy occurrence
range a maximum value of 19.4% for both 92% and 95%.
For dataset 2, the accuracy values range from 0.39 to 0.90, indicating the correctness of
our proposed model as shown in Fig. 14. The associated percentages vary, showcasing the
distribution of how frequently each accuracy level occurs within the dataset. For instance,
an accuracy of 0.61 is associated with a high percentage of 27.6%, suggesting a relatively
low precision in that specific case. Conversely, some values, like 0.81, 0.82, 0.85, and 0.86,
exhibit consistent accuracy percentages of 3.4%, implying more stable performance.
Intriguingly, accuracies of 0.39, 0.73, 0.76, 0.84 and 0.90 have percentages of 6.9%,
potentially indicating areas of interest or significance within the context of the data.
Moreover, there are instances where relatively high accuracy levels, such as 0.88, are
accompanied by a 10.3% percentage, possibly suggesting that precision is balanced with
occurrences.
Table 6 presents a comprehensive comparison of various brain tumor classification
methods, revealing their respective accuracy scores. Each method’s accuracy value
indicates the percentage of correctly classified brain tumor images in the dataset. Notably,
the “Fine-tuned ResNet-50 with CNN” achieves an accuracy of 0.95, leveraging the
combined strengths of the fine-tuned ResNet-50 model and a CNN architecture. Similarly,
the “Hybrid approach” attains an accuracy of 0.90 by employing a combination of
traditional machine learning algorithms and deep learning models, enhancing
classification robustness. The “U-Net with fine-tuned ResNet50” method achieves an
accuracy of 0.94, benefiting from U-Net’s segmentation capabilities in conjunction with a
fine-tuned ResNet-50 model. The “Hybrid Ensemble” and “CNN Ensemble” methods both
achieve an accuracy of 0.95 through the ensemble of diverse models and techniques,
CONCLUSION
In this study, we developed a new CNN model for brain tumor diagnosis. We carefully
tuned the hyperparameters of the model, including the number of filters, filter size, stride
padding, pooling techniques, activation functions, learning rate, batch size, and layer
configuration. This resulted in a significant improvement in diagnostic accuracy. We tested
our model on two publicly available brain tumor MRI datasets. For the first dataset, which
contains 7,023 brain images across four classes, our model achieved an accuracy of 96%,
along with an average precision, recall, and F1-score of 94.25%. For the second dataset,
which contains 253 images, our model achieved an accuracy of 88%, along with precision,
recall, and F1-score values of 87.5%. These results demonstrate the effectiveness of our
CNN model for brain tumor detection and classification. Our model outperforms existing
methods in terms of accuracy and efficacy, and it offers the potential to improve the
precision and efficiency of brain tumor diagnosis. This could lead to earlier detection and
treatment of brain tumors, which could save lives. In addition, our study demonstrates the
importance of hyperparameter optimization in CNN models for medical diagnostics. By
carefully tuning the hyperparameters of our model, we were able to achieve significant
improvements in accuracy. This suggests that hyperparameter optimization could be used
to improve the performance of CNN models for other medical applications, such as cancer
detection and diagnosis.
Funding
This work is supported by the Deanship of Scientific Research, Najran University,
Kingdom of Saudi Arabia under the Distinguished Research funding program grant code
number (NU/DRP/MRC/12/28). The funders had no role in study design, data collection
and analysis, decision to publish, or preparation of the manuscript.
Grant Disclosures
The following grant information was disclosed by the authors:
Deanship of Scientific Research, Najran University. Kingdom of Saudi Arabia,
Distinguished Research funding program: NU/DRP/MRC/12/28.
Competing Interests
The authors declare that they have no competing interests.
Data Availability
The following information was supplied regarding data availability:
The code is available at GitHub and Zenodo:
- https://github.com/iamshaf/CNN-HyperParameter.
- iamshaf. (2024). iamshaf/CNN-HyperParameter: HPCNN (HPCNN). Zenodo.
https://doi.org/10.5281/zenodo.10476557.
Dataset 1 is available at Kaggle: https://www.kaggle.com/datasets/masoudnickparvar/
brain-tumor-mri-dataset.
Dataset 2 is available at Kaggle:
https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-
detection.
REFERENCES
Abir TA, Siraji JA, Khulna AE. 2018. Analysis of a novel MRI based brain tumour classification
using probabilistic neural network (PNN). International Journal of Scientific Research in Science,
Engineering and Technology 4(8):65–79.
Abiwinanda N. 2018. Brain tumor classification using convolutional neural network. In: World
Congress on Medical Physics and Biomedical Engineering. Cham: Springer, 1.
Abiwinanda N, Hanif M, Hesaputra ST, Handayani A, Mengko TR. 2019. Brain tumor
classification using convolutional neural network. In: IFMBE Proceedings. Singapore: Springer
Nature, 183–189.
Al-Ayyoub M, Husari G, Darwish O, Alabed-alaziz A. 2012. Machine learning approach for brain
tumor detection. In: Proceedings of the 3rd International Conference on Information and
Communication Systems. New York: ACM.
Alsaif H, Guesmi R, Alshammari BM, Hamrouni T, Guesmi T, Alzamil A, Belguesmi L. 2022. A
novel data augmentation-based brain tumor detection using convolutional neural network.
Applied Sciences 12(8):3773 DOI 10.3390/app12083773.