Brain Tumor Detection and Classification
Brain Tumor Detection and Classification
LICENSE
CC BY 4.0
23-10-2021 / 26-10-2021
CITATION
GHOSH, ANKIT; KOLE, ALOK (2021): A Comparative Study of Enhanced Machine Learning Algorithms for
Brain Tumor Detection and Classification. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.16863136.v1
DOI
10.36227/techrxiv.16863136.v1
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1
application in the field of medical imaging. Noreen et al. [18] the four classes (normal, glioblastoma, sarcoma and metastatic
have proposed the use of two pre-trained deep learning models bronchogenic carcinoma tumors).
i.e. Inception-v3 and DensNet201 for developing a multi-level In [23] Rehman et al. have conducted three studies using
feature extraction and concatenation method for the early three architectures of convolutional neural networks (AlexNet,
detection of brain tumors and their classification. At first, they GoogLeNet, and VGGNet) to perform the classification of
have extracted the features from different Inception modules brain tumors such as meningioma, glioma, and pituitary. Then
from the pre-trained Inception-v3 model. Then they have they have explored the transfer learning techniques, i.e., fine-
passed those features to the softmax classifier to perform the tune and freeze using MRI slices of brain tumor dataset—
classification of the brain tumors. Secondly, they have used a Figshare. They have applied data augmentation techniques to
pre-trained DensNet201 to extract features from various the MRI images to generalize the results, increase the dataset
DensNet blocks. Then they have concatenated those features samples and reduce the chance of over-fitting. The proposed
and passed them to the softmax classifier to classify the brain fine-tune VGG16 architecture has attained the highest
tumors. The dataset that they have used comprised of three accuracy up to 98.69% in terms of classification and detection.
classes of brain tumors and it is available publicly. Their
proposed methodology has produced exceptional results and III. DATASETS
has outperformed all the existing state-of-the-art ML and Deep Two different datasets have been used:
Learning (DL) models for brain tumor detection and 1. Dataset-A (binary classification)
classification. 2. Dataset-B (multi-class classification)
In [19] Naik and Patel have used the decision tree
classification algorithm for the detection and classification of A. Dataset-A (binary classification)
brain tumor from MRI images. In the pre-processing step they Dataset-A has been used for binary classification. It
have used the median filtering process and texture feature comprises of 982 brain MRI images of patients with tumor
extraction technique has been used to extract the features. and 493 images with no tumor. Thus, a total of 1475 images
Their proposed model has exhibited improved efficiency in are present. A collection of 18 brain tumor images from
comparison to the traditional image mining methods. The Dataset-A are shown in Fig.1.
results that they have obtained have been compared with the
Naïve Bayesian classification algorithm. The decision tree
classification algorithm has achieved a precision of 100%,
Sensitivity of 93%, Specificity of 100% and Accuracy of 96%.
Tandel et al. [20] have proposed a transfer-learning-based
AI paradigm using a Convolutional Neural Network
(CNN) for brain tumor classification using MRI data. The
transfer-learning-based CNN model has been benchmarked
against six different ML classification algorithms, namely
Decision Tree, Linear Discrimination, Naive Bayes, Support
Vector Machine, K-nearest neighbour and Ensemble. Their
proposed model has proven to be very useful in multiclass
brain tumour grading and has yielded better results in Fig. 1. Collection of 18 brain tumor MRI images from Dataset-A
comparison to the other ML models.
Sarhan [21] has presented a computer-aided detection 15 images with no brain tumor from Dataset-A are shown in
(CAD) technique for the classification of brain tumors in MRI Fig.2.
images. The features from the brain MRI images have been
extracted by utilizing the Discrete Wavelet Transform (DWT).
The extracted features have then been applied to a CNN to
classify the input MRI image. His proposed approach has
produced an overall accuracy of 98.5%.
Mohsen et al. [22] in their research work have proposed the
development of a Deep Neural Network (DNN) classifier for
the classification of brain tumors on a dataset comprising of
66 brain MRI images of 4 types of brain tumors, namely,
normal, glioblastoma, sarcoma and metastatic bronchogenic
carcinoma tumors. They have combined the classifier with
DWT for feature extraction and principal components analysis
(PCA). The DNN classifier yielded extremely good results
with an average classification rate of 96.97%, average recall of
0.97, average precision of 0.97, average F-Measure of 0.97
Fig. 2. Collection of 15 brain MRI images with no tumor from Dataset-A
and average area under the ROC curve (AUC) of 0.984 of all
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3
there are chances of over-fitting. So, Random Forest is used to image in the dataset has been converted into a dimension of
avoid over-fitting of data. Random Forest can be used to solve 200*200 pixels.
both classification and regression problems. Stochastic
Gradient Descent (SGD) is an efficient optimization algorithm
which minimizes the cost function by altering the values of the
parameters or coefficients of a function [30]. SGD Classifier
implements a SGD learning routine to support various loss
functions to perform classification tasks. Extreme Gradient
Boosting, XGBoost is a member of the family of boosting
algorithms. It is an efficient implementation of the Gradient
Boosted Trees algorithm which is a supervised learning
method. It is an ensemble ML technique and uses Gradient
Boosting framework for prediction [31]. Boosting is an
ensemble learning technique. It combines predictors with low Fig. 8. Image pre-processing (Dataset-A)
accuracy and converts them into a model with an improved
accuracy [32]. In gradient boosting the errors made by the 2) Splitting the dataset
predecessors is corrected by the predictor itself resulting in a The entire dataset has been split into training and testing
strong model with high accuracy. data with a test size of 25%.
3) ML models used
V. METHODOLOGY The following ML algorithms have been implemented to
perform the binary classification task:
The proposed methodology has been illustrated in Fig.7.
SVM
Logistic Regression
KNN
NB
DT
Random Forest
SGD classifier
XGBoost
Gradient Boosting classifier
B. Methodology (multi-class classification)
The methodology for multi-class classification of the brain
MRI images in Dataset-B has been described in this section.
1) Data pre-processing
Data labeling
The images with no tumor have been labeled as ‗0‘, images
of glioma brain tumor as ‗1‘, images of meningioma as ‗2‘ and
pituitary tumor as ‗3‘.
Fig. 7. Flowchart of the proposed workflow Image pre-processing
A. Methodology (binary classification) Each and every image in Dataset-B has been resized into a
dimension of 200*200 pixels. For instance, the original image
In this section, the methodology that has been used to predict
shown in Fig.9 has a dimension of 350*350 pixels. It has been
whether a patient has brain tumor or not from the brain MRI
converted into a dimension of 200*200 pixels.
images in Dataset-A has been described.
1) Data pre-processing
Data labeling
The images of brain tumor have been labeled as ‗1‘ and the
images with no brain tumor as ‗0‘.
Image pre-processing
The images have been read in the gray scale (2D). To build a
classifier using ML algorithms all the images have been
converted into the same dimension. So, each image has been
resized into 200*200 pixels.
For instance, the original image as shown in Fig. 8 has a
dimension of 630*630 pixels. Its dimension has been
transformed into 200*200 pixels. Similarly, each and every Fig. 9. Image pre-processing (Dataset-B)
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5
Fig. 15. ROC curve of Random Forest classifier Fig. 18. ROC curve of Gradient Boosting classifier
Table III
Performance comparison of the models based on AUC-ROC
model AUC-ROC
SVM 0.888
Logistic Regression 0.886
Fig. 16. ROC curve of SGD classifier
KNN 0.920
NB 0.755
DT 0.847
Random Forest 0.975
SGD 0.806
XGBoost 0.973
Gradient Boosting 0.972
From the test results in Table III, it can be observed that the
AUC-ROC scores of the Random Forest, XGBoost and
Gradient Boosting classifiers are extremely close. While
Random Forest has an AUC-ROC score of 0.975, XGBoost
has an AUC-ROC of 0.973 and Gradient Boosting has an
AUC-ROC of 0.972. KNN follows Random Forest, XGBoost
and Gradient Boosting classifiers with an AUC-ROC score of
0.920. The AUC-ROC scores of SVM and Logistic
Fig. 17. ROC curve of XGBoost classifier Regression have also been quite promising. SVM and Logistic
Regression have AUC-ROC scores of 0.888 and 0.886
respectively. DT has an AUC-ROC score of 0.847 and AUC-
ROC score for SGD classifier is 0.806. NB has the least AUC-
ROC score of 0.755. Thus, it can be observed that the
performances of Random Forest, XGBoost and Gradient
Boosting classifiers stand out, each having AUC-ROC scores
very near to 1. While Gradient Boosting classifier has the
highest accuracy among the classifiers followed by Random
Forest and XGBoost, Random Forest and XGBoost both have
slightly higher AUC-ROC scores than Gradient Boosting.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 8
Fig. 24. PR curve of Random Forest classifier Fig. 27. PR curve of Gradient Boosting classifier
Table IV
Performance comparison of the models based on AUC-PR
and AP
model AUC-PR AP
Fig. 25. PR curve of SGD classifier
SVM 0.829 0.896
Logistic 0.837 0.892
Regression
KNN 0.860 0.946
NB 0.723 0.790
DT 0.832 0.871
Random Forest 0.946 0.988
SGD 0.804 0.837
XGBoost 0.940 0.988
Gradient Boosting 0.914 0.987
Table V in order to conclude which of these classifiers has SGD 2.629 0.061
exhibited the best overall performance. XGBoost 145.068 0.65
Gradient Boosting 1180.088 0.172
Table V
Evaluation metrics of Random Forest, XGBoost and As demonstrated in Table VI, NB classifier takes the least
Gradient Boosting classifiers training time among all the other algorithms. It takes a training
time of 2.362s. Gradient Boosting takes the maximum time for
metric Random XGBoost Gradient training. The training time for Gradient Boosting classifier is
Forest Boosting 1180.088s. Its prediction time is however 0.172s. The
Accuracy 0.908 0.894 0.924 prediction time for KNN is the highest. It takes 47.655s for
Recall 0.952 0.888 0.944 predicting the outcome. Logistic Regression with a prediction
Precision 0.811 0.818 0.850 time of 0.05s is the fastest among the ML algorithms in
F1-score 0.876 0.852 0.895 predicting the outcome.
AUC-ROC 0.975 0.973 0.972 The test results of the multi-class classification problem
AUC-PR 0.946 0.940 0.914 have been described in the following sections.
AP 0.988 0.988 0.987
B. Multi-class classification results
As depicted in Table V, Gradient Boosting classifier has an
accuracy of 0.924 which is higher than that of Random Forest A comparative analysis of the 4 ML algorithms has been
classifier by (0.924-0.908=) 0.016 and exceeds the accuracy of done based on the following evaluation metrics:
XGBoost classifier by (0.924-0.894=) 0.030. The precision of Accuracy
Gradient Boosting is 0.850. Precision of XGBoost classifier is Recall (weighted average)
0.818 which is less than that of Gradient Boosting by (0.850- Precision (weighted average)
0.818=) 0.032. The precision of Random Forest is 0.811 which F1-score (weighted average)
falls short of (0.850-0.811=) 0.039 from the precision of AUC-ROC
Gradient Boosting classifier. The F1-score of Gradient
Boosting classifier is 0.895 which is higher than that of The performance comparison of the proposed algorithms on
XGBoost and Random Forest. XGBoost and Random Forest the basis of accuracy, recall, precision and F1-score has been
have F1-scores of 0.852 and 0.876 respectively. However, the demonstrated in Table VII.
AUC-ROC scores and AUC-PR scores of both Random Forest
and XGBoost are higher than that of Gradient Boosting. Table VII
Random Forest has an AUC-ROC score of 0.975 which is Performance comparison of the proposed algorithms
higher than that of Gradient Boosting by (0.975-0.972=)
0.003. The AUC-ROC score of XGBoost is higher than that of model SVM KNN Random XGBoost
Gradient Boosting by (0.973-0.972=) 0.001 only. Gradient Forest
Boosting has an AUC-PR score of 0.914 which is less than Accuracy 0.85 0.77 0.89 0.90
that of Random Forest by (0.946-0.914=) 0.032 and that of Recall(Weighted 0.85 0.77 0.89 0.90
XGBoost by (0.940-0.914=) 0.026. Therefore, after comparing Average )
all the evaluation metrics it can be concluded that Gradient Precision(Weighted 0.87 0.82 0.89 0.90
Boosting classifier has exhibited the best performance Average)
altogether with an accuracy, recall, precision, F1-score, AUC- F1-score(Weighted 0.85 0.78 0.89 0.90
ROC and AUC-PR scores of 0.924, 0.944, 0.850, 0.895, 0.972 Average)
and 0.914 respectively.
Also, the training time and prediction time of all the ML As depicted in Table VII, XGBoost has outperformed the
algorithms have been evaluated. This has been described in other models in terms of accuracy, recall, precision and F1-
Table VI. score. XGBoost has produced an accuracy, recall, precision
and F1-score of 0.90 respectively.
Table VI In order to visualize the performance of the multi-class
Comparison of training time and prediction time among the classifiers, the AUC-ROC curves of the four ML algorithms
ML algorithms have also been plotted.
model training time (s) prediction time (s) ROC curves of the multi-class classifiers for performance
SVM 39.325 12.24 comparison
Logistic 12.523 0.05
Regression The ROC curves corresponding to SVM, KNN, Random
KNN 13.079 47.655 Forest and XGBoost classifiers have been shown in Fig. 28,
NB 2.362 1.393 29, 30 and 31 respectively.
DT 24.41 0.101
Random Forest 13.268 0.172
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 11
Fig. 28. ROC curve of SVM multi-class classifier Fig. 31. ROC curve of XGBoost classifier
Table VIII
AUC-ROC scores of the proposed models
model AUC-ROC
SVM 0.931
KNN 0.899
Random Forest 0.989
XGBoost 0.990
performance metrics such as accuracy, recall, precision, F1- [14] Ankit Ghosh, Purbita Kole and Alok Kole, ―Automatic Identification of
Score, AUC-ROC curves and AUC-PR curves. After the Covid-19 from Chest X-ray Images using Enhanced Machine Learning
evaluation of the test scores, it has been concluded that Techniques‖, International Research Journal of Engineering and
Technology (IRJET),vol.8, issue.9, no.115, pp.765-772, 2021.
Gradient Boosting is the best classifier among all the other
[15] Miles N. Wernick, Yongyi Yang, Jovan G. Brankov, Grigori Yourganov
ML classifiers that have been used. Also, multi-class
and Stephen C. Strother, ―Machine Learning in Medical Imaging‖, IEEE
classification has been performed on a different dataset Signal Processing Magazine, vol.27, issue.4, pp.25-38, 2010.
comprising of brain MRI images of glioma, meningioma, [16] Sanjay Saxena, Neeraj Sharma and Shiru Sharma, ―Image Processing
pituitary and no tumor using SVM, KNN, Random Forest and Tasks using Parallel Computing in Multi core Architecture and its
XGBoost classifier. The ML algorithms have been compared Applications in Medical Imaging‖, International Journal of Advanced
based on accuracy, recall, precision, F1-score, AUC-ROC Research in Computer and Communication Engineering, vol.2, issue.4,
score and it has been observed that XGBoost classifier has pp.1896-1900, 2013.
exhibited the best results. In future, one of the most important [17] Chetanpal Singh, ―Medical Imaging using Deep Learning Models‖,
improvements that can be made is adjusting the architecture so European Journal of Engineering and Technology Research, vol.6,
that it can be used during brain surgery, for classifying and issue.5, pp.156-167, 2021.
[18] Neelum Noreen, Sellappan Palaniappan, Abdul Qayyum, Iftikhar
accurately locating the tumor. Detecting the tumors in the
Ahmad, Muhammad Imran and Muhammad Shoaib, ―A Deep Learning
operating theatre can be performed in real-time conditions;
Model Based on Concatenation Approach for the Diagnosis of Brain
thus, in that case, the improvement would also involve Tumor‖, IEEE Access, vol.8, pp. 55135 – 55144, 2020.
adapting the network architecture to a 3D system. By keeping [19] Janki Naik and Sagar Patel, ―Tumor Detection and Classification using
the network architecture simple, detection in real time can be Decision Tree in Brain MRI‖, IJCSNS International Journal of
made possible. Computer Science and Network Security, vol.14, no.6, pp.87-91, 2014.
[20] Gopal S. Tandel, Antonella Balestrieri, Tanay Jujaray, Narender N.
REFERENCES Khanna, Luca Saba and Jasjit S. Suri, ―Multiclass magnetic resonance
imaging brain tumor classification using artificial intelligence
[1] Lars Kunze, Nick Hawes, Tom Duckett, Marc Hanheide and Tomáš paradigm‖, Computers in Biology and Medicine, vol.122, pp.103804-
Krajník, ―Artificial Intelligence for Long-Term Robot Autonomy: A 103860, 2020.
Survey‖, IEEE Robotics and Automation Letters, vol.3, issue.4, pp.4023- [21] Ahmad M. Sarhan, ―Detection and Classification of Brain Tumor in
4030, 2018. MRI Images Using Wavelet Transform and Convolutional Neural
[2] Li Deng, ―Artificial Intelligence in the Rising Wave of Deep Learning: Network‖, Journal of Advances in Medicine and Medical
The Historical Path and Future Outlook [Perspectives]‖, IEEE Signal Research,vol.32, issue.12, pp.15-16, 2020.
Processing Magazine, vol.35, issue.1, pp.180-187, 2018. [22] Heba Mohsen, El-Sayed A. El-Dahshan, El-Sayed M. El-Horbaty and
[3] Alok Kole, ―Design and Stability Analysis of Adaptive Fuzzy Feedback Abdel-Badeeh M. Salem, ―Classification using deep learning neural
Controller for Nonlinear Systems by Takagi-Sugeno Model based networks for brain tumors‖, Future Computing and Informatics Journal,
Adaptation Scheme‖, International Journal of Soft Computing, vol.19, vol.3, issue.1, pp.68-71, 2018.
issue.6, pp.1747-1763, 2015. [23] Arshia Rehman, Saeeda Naz, Muhammad Imran Razzak, Faiza Akram
[4] Ruimin Ke, Yifan Zhuang, Ziyuan Pu and Yinhai Wang, ―A Smart, and Muhammad Imran, ―A Deep Learning-Based Framework for
Efficient, and Reliable Parking Surveillance System With Edge Automatic Brain Tumors Classification Using Transfer Learning‖,
Artificial Intelligence on IoT Devices‖, IEEE Transactions on Intelligent Circuits, Systems, and Signal Processing, Springer, vol.39, pp.757–775,
Transportation Systems, vol.22, issue.8, pp. 4962 – 4974, 2021. 2020.
[5] P. P. Bhattacharya, Alok Kole, Tanmay Maity and Ananya [24] Hyun-Chul Kim, Shaoning Pang, Hong-Mo Je, Daijin Kim and
Sarkar,‘Neural Network Based Energy Efficiency Enhancement in
Sung Yang Bang, ―Constructing support vector machine ensemble‖,
Wireless Sensor Networks‘, International Journal of Applied
Engineering Research, vol. 9, no.22, pp. 11807-11818, 2014. Pattern Recognition, vol.36, issue.12, pp. 2757-2767, 2003.
[6] Huimin Lu, Yujie Li, Min Chen, Hyoungseop Kim and Seiichi [25] Sandro Sperandei, ―Understanding logistic regression analysis‖,
Serikawa, ―Brain Intelligence: Go beyond Artificial Intelligence‖, Biochemia Medica, vol. 24, no. 1, pp.12-18, 2014.
Mobile Networks and Applications, vol.23, pp.368-375, 2018. [26] Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu and Ruili Wang,
[7] Sanjeevani Bhardwaj and Alok Kole, ‗Review and Study of Internet of ―Efficient kNN Classification With Different Numbers of Nearest
Things: It‘s the Future‘, in Proc. IEEE International Conference on Neighbors‖, IEEE Transactions on Neural Networks and Learning
Intelligent Control, Power and Instrumentation (ICICPI-2016), Kolkata,
Systems, vol.29, issue.5, pp. 1774 – 1785, 2018.
India, 2016, pp.47-50.
[8] Daniel E. O'Leary, ―Artificial Intelligence and Big Data‖, IEEE [27] Saurabh Mukherjee Dr., Neelam Sharma, ―Intrusion Detection using
Intelligent Systems, vol.28, issue.2, pp.96-99, 2013. Naive Bayes Classifier with Feature Reduction‖, Procedia Technology,
[9] Jiaying Liu, Xiangjie Kong, Feng Xia, Xiaomei Bai, Lei Wang, Qing vol.4, pp.119-128, 2012.
Qing and Ivan Lee, ―Artificial Intelligence in the 21st Century‖, IEEE [28] S.R. Safavian and D. Landgrebe, ―A survey of decision tree classifier
Access, vol.6, pp. 34403 – 34421, 2018. methodology‖, IEEE Transactions on Systems, Man, and Cybernetics,
[10] Chinmaya Kumar Pradhan, Shariar Rahaman, Md. Abdul Alim Sheikh, vol.21, issue.3, pp. 660 – 674, 1991.
Alok Kole and Tanmoy Maity, ‗EEG Signal Analysis Using Different [29] Angshuman Paul, Dipti Prasad Mukherjee, Prasun Das, Abhinandan
Clustering Techniques‘, in Proc. International Conference on Emerging Gangopadhyay, Appa Rao Chintha and Saurabh Kundu, ―Improved
Technologies in Data Mining and Information Security, Kolkata, West Random Forest for Classification‖, IEEE Transactions on Image
Bengal, 2018, pp.99-105. Processing, vol.27, issue.8, pp.4012 – 4024, 2018.
[11] Alejandro F. Frangi, Sotirios A. Tsaftaris and Jerry L. Prince, [30] N. Deepa, B. Prabadevi, Praveen Kumar Maddikunta, Thippa Reddy
―Simulation and Synthesis in Medical Imaging‖, IEEE Transactions on Gadekallu, Thar Baker, M. Ajmal Khan and Usman Tariq, ―An AI-based
Medical Imaging, vol.37, issue.3, pp. 673 – 679, 2018. intelligent system for healthcare analysis using Ridge-Adaline Stochastic
[12] Norio Nakata, ―Recent technical development of artificial intelligence Gradient Descent Classifier‖, The Journal of Supercomputing,
for diagnostic medical imaging‖, Japanese Journal of Radiology, vol.37, vol.77, pp.1998–2017, 2021.
pp.103-108, 2019. [31] Shenglong Li and Xiaojing Zhang, ―Research on orthopedic auxiliary
[13] Subhamoy Mandal, Aaron B. Greenblatt and Jingzhi An, ―Imaging classification and prediction model based on XGBoost algorithm‖,
Intelligence: AI Is Transforming Medical Imaging Across the Imaging Neural Computing and Applications, vol.32, pp.1971–1979, 2020.
Spectrum‖, IEEE Pulse, vol.9, Issue.5, pp. 16 – 24, 2018.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 13