RevisedReport 1 1
RevisedReport 1 1
Abstract
Brain tumors stands as one of the most severe because they produce harmful abnormal cell growth in brain
tissue that can become fatal. Patient prognosis gets enhanced by timely and proper diagnosis procedures.
Currently medical staff execute manual analysis of MRI scan data to identify tumors during radiological
examinations though this method is slow and difficult to control when working with extensive datasets while
also containing potential human mistakes. Status reports indicate that modern deep learning algorithms present
remarkable potential for enhanced computerized identification of brain tumors while requiring less time for
completions. The research implements YOLOv11 as a real-time object detection model that utilizes EfficientNet
for better feature extraction while combining it with Segment Anything Model 2 (SAM2) to achieve detailed
tumor segmentation. YOLOv11 finds tumors quickly then SAM2 performs segmented tumor analysis
consistently for different tumor size and form. The training process employed MRI data of brain tumors derived
from Roboflow. The YOLOv11 detection performance improves substantially through EfficientNet
implementation because the model can extract complex patterns best with limited parameters. The YOLOv11-
EfficientNet and SAM2 combination delivers rapid precise results suitable for essential instant clinical decision
systems. Experimental findings showed the model reached 87.4% Mean Average Precision (mAP@50) at 0.5
Intersection over Union threshold and F1 score levels of 83.5% and precision rates of 93.7% and recall
measurements of 75.3% together with 66.7% mAP@50:95. The detection method proves its reliability for brain
tumors through these demonstrated outcomes. These AI-assisted systems boost radiologists’ working methods
while fewer mistakes arise and increased early medical involvement generates superior patient success rates.
Introduction
Brain tumors, along with central nervous system tumors, result in post-treatment disabilities due to their
presence alongside treatment consequences, which include surgery and the use of radiation therapy and
chemotherapy. Defects caused by these tumors often lead to debilitating health conditions, although tumors are
considered benign or low-grade. People across age groups experience different rates of brain tumor
development, yet their tumor classifications differ between adult and pediatric populations. The management of
symptoms and complications, including fatigue and headache, requires immediate attention to provide quality
patient care [1]. Through MRI technologies, doctors achieve early identification of brain tumors as well as exact
diagnosis and treatment strategy planning by visualizing brain structures in detail. The high-definition images
produced by MRI enable practitioners to detect abnormalities as they emerge so they can immediately begin the
intervention process. Updated treatment strategies depend on the specific information about tumor position and
size as well as the effects on adjacent tissues that MRI provides. Standard MRI serves as a fundamental medical
imaging approach to generate detailed pictures of brain tissue and muscles through T1 and T2-weighted
scanning methods[2]. The brain activity measurement through Functional MRI fMRI depends on blood flow
tracking [3], while Diffusion Tensor Imaging DTI[4] enables visualization of brain white matter pathways and
Magnetic Resonance Spectroscopy (MRS)[5] evaluates tissue chemicals, thus helping identify brain tumors. The
use of gadolinium under contrast-enhanced MRI increases tissue visibility, but 3D MRI produces successful
three-dimensional models, invaluable for surgical designs. Tissue stiffness diagnosis and liver condition and
tumor detection are both possible through the use of MR Elastography technology [6]. Whereas, the lack of
physical significance in MRI intensity values does not prevent their diagnostic uses as doctors depend on
relative intensity distinctions to show anatomical structure while these values cannot demonstrate absolute
quantitative meaning. Research teams find the clinical advantage in these intensity values because they do not
need absolute measurements even though acquiring absolute values demands multiple scans that cause
acquisition delays and technical complexities. Noise affects many MRI images and distorts their picture quality
in ways that impair voxel-based analysis. The common approach for noise reduction utilizes smoothing
techniques, although extended smoothing can eliminate vital structural elements at the same time, it softens
image boundaries. The application of convolution-based filtering constitutes a main procedural method for
reducing noise and enhancing images within MRI preprocessing through voxel intensity adjustments based on
nearby values[7]. The most basic form of smoothing filter averages voxel values by implementing direct
neighbour weighting schemes. Image enhancement through Gaussian filtering preserves edges while it
suppresses high-frequency noise through the application of a kernel with binomial distribution properties.
Denoising functions as a major step to eliminate noise that appears in the acquisition process. The noise
reduction process can preserve structural information through applications of Non-Local Means filtering and
wavelet-based procedures and deep learning solutions such as DnCNN. Inhomogeneity correction techniques
solve the intensity problems that originate from shifts within the magnetic field. MRI image registration stands
as a crucial step to match scans obtained from various time points or modalities as well as scans originating
from different subjects. [8]. Data augmentation serves as a technique to expand the variety of training samples,
which results in better robustness of deep learning models. The model generalization increases through common
augmentation methods that consist of random rotations, flipping, scaling, elastic deformations, and intensity
modifications to simulate actual clinical variations [9]. Numerous experts examine different machine learning
and deep learning algorithms aiming to detect brain tumors while striving to achieve better accuracy results
through error reduction. Medical image analysis utilizes Convolutional Neural Networks (CNNs) [10] as the
dominant technique because these networks identify and segment tumors automatically through complex pattern
extraction from medical images. Deep learning utilizes strong computational infrastructure to evaluate medical
images, which replaces the need for physician-made features during diagnosis. These modern detection methods
have increased brain tumor examination precision while making processes faster, so automated systems provide
additional assistance during early detection and treatment strategy creation for radiologists. The use of extensive
image data by CNN-based systems allows them to find small anomalies which traditional approaches struggle to
spot [11]. Different methods employed for brain tumor detection incorporate traditional machine learning
together with deep learning approaches to improve diagnostic precision. Traditional methods segment tumor
regions from MRI or CT scans by applying a combination of thresholding, region growing, edge detection and
morphological operations. K-Means clustering functions as the pixel grouping algorithm to extract tumors by
organizing pixels through intensity comparisons for area differentiation[12]. The boundary detection process
improves through Fuzzy C-Means (FCM) clustering because it analyzes pixel membership probabilities to
achieve higher precision in region separations[13]. SVM machine learning classifiers contribute to identifying
tumor-affected regions through high-dimensional space exploration to find optimal separating planes between
tumor and non-tumor regions[14]. Tumor classification and pattern recognition are simplified through K-
Nearest Neighbor (KNN) because it relies on feature similarity between data points to determine
classifications[15]. Through the combination of multiple decision trees, Random Forest, which fights
overfitting, so it attains better generalization effectiveness[16]. Using probabilistic models, Naïve Bayes
performs tumor categorization based on feature likelihood characteristics, which helps increase processing speed
for extensive datasets [17]. Medical image analysis is accomplished through rectangular box detection
techniques that lead to visual representations of potential brain tumors, thus enabling healthcare workers to
examine suspected tumor sites for diagnosis and therapy preparation. The detection process becomes more
efficient and faster through integrated localization and classification methods that work in one process step.
Faster R-CNN [18] enhances detection accuracy by using region proposal networks as an advanced model,
which also enables Mask R-CNN[19] to perform instance segmentation that precisely defines tumor boundary
boundaries together with classification and localization capabilities. Real-time tumor detection has been
transformed using the detection algorithms Single Shot MultiBox Detector (SSD)[20] and You Only Look Once
(YOLO) series which perform both accuracy and speed assessment in one consolidated process. SSD transforms
the input image into a grid then uses separate aspect ratios for boundary box predictions within each grid cell to
achieve efficient multi-scale object detection [21]. YOLO series operates through a unified architecture that runs
full image processing within one pass, leading to lower computational time with superior detection precision
[22].
Literature Review
This paper examines modern techniques for brain tumor diagnosis together with artificial intelligence as well as
recent advancements in medical imaging. The analysis starts by evaluating modern tumor detection methods that
show improvements toward both accuracy and efficiency capabilities. Brain tumors form because cells multiply
excessively without control within the brain tissue, thus creating a dangerous condition that requires early
diagnosis. Accurate tumor segmentation with classification becomes difficult for advanced medical imaging and
AI because tumors show diverse dimensions and structures throughout different brain areas [23]. Radiologists
have trouble in brain tumor detection because of diverse tumor cell structures. Noreen et al [24] introduces a
feature representation enhancement method through multi-level feature extraction along with pre-trained
Inception-v3 and DenseNet201 models for concatenation purposes. The research of Chauhan et al [25] presents
PBVit as an application of Patch-Based Vision Transformers (PBVit) for brain tumor recognition through a
technique that processes images by small matching patterns. Through the process of dividing tumor images into
equal-sized patches and mapping spatial connections PBVit increases its ability to spot detailed patterns in brain
images. PBVit achieved significant better performance against standard CNN systems while processing the
Figshare brain tumor dataset thereby showing potential transformer-based medical imaging capabilities. Asiri
[26] develops a two-module computerized system which speeds up and enhances brain tumor detection
accuracy. Adaptive Wiener filtering works together with neural networks and independent component analysis
in the first module to enhance images and improve their contrast through an Image Enhancement Technique.
Support Vector Machines in the second module segment tumors while validating results which results in both
boosted processing efficiency and superior diagnostic scores. The research of Shah et al. [27] presents a reliable
brain tumor detection system through the utilization of EfficientNet with model fine-tuning for MRI image
processing. The model demonstrates expertise in extracting difficult tumor features which leads to improved
classification performance. The method achieves advanced benchmark results on database testing against
traditional CNN architectures. Neamah et al. [28] evaluate deep learning models used for brain tumor analysis
through systematic review methods. The research document evaluates CNN-based along with transformer-based
models by assessing their performance metrics and computing power and dataset reliance. Research directions
in deep learning for medical imaging receive clarification from this assessment. Semantic brain tumor detection
in medical imaging was achieved through VGG-SCNet whose primary components include VGG-based deep
learning framework from Majib et al. [29]. The application of skip connections inside the model maintains
important features that boost the model's performance in segmentation tasks. Testing proves that brain tumors
can be successfully detected through the framework applied to MRI data because it outperforms Standalone
CNN networks during detection operations. The authors Abraham et al. [30] introduced a hybrid system that
combines dilated convolution with YOLOv8 for MRI-based brain tumor detection. This detection method
locates tumor regions effectively with better performance and minimal incorrect positive labels. The research
examination demonstrates how units detection methods can successfully integrate with medical imaging
technologies. The review by Solanki et al. [31] discusses intelligent methods used for brain tumor detection and
classification tasks. The text examines AI diagnostic techniques in three stages: deep learning and ensemble
learning and hybrid models which impact the decision-making process in clinical settings. The work of
Almufareh et al. [32] presents an automated brain tumor segmentation and classification system through YOLO
deep learning models. The framework acts as a detection enhancement tool which increases accuracy
measurements and decreases computational requirement to enable real-time medical application use. The
researchers in Younis et al. [33] conducted a performance analysis of ResNet50 for abnormal brain tumor
classification. This analysis confirms that deep residual networks process features more effectively and achieve
superior classification outcomes which makes them suitable for tumor detection tasks. Ahmad and Choudhury
[34] conducted an analysis to evaluate deep transfer learning networks in their ability to detect brain tumors
using MR images. This research compares VGG16 as well as ResNet and Inception due to their potential use in
medical image analysis. The paper by Asif et al. [35] studies multiple deep transfer learning-based approaches
to achieve higher brain tumor detection outcomes. Limited MRI datasets benefit from transfer learning when
used together with fine-tuning strategies according to the research findings. The method for optimized edge
detection of brain tumors in MR images was presented by Abdel-Gawad et al. [36]. The research combines
modern preprocessing methods which enable more accurate definition of tumor areas for better diagnostics. The
proposed work of Farzamnia et al. [37] studies the application of contourlet transform and time-adaptive self-
organizing maps for MRI brain tumor detection processes. This method develops more precise tumor
localization by detecting high-frequency image characteristics in order to enhance segmentation quality. A
modified version of YOLOv8 presents accurate tumor lesion identification capabilities in medical images
according to Yao et al. [38]. Engineers optimized the model design because it improved both feature discovery
capabilities and misdiagnosis rate performance. The authors in Bibi et al. [39] developed a transfer learning
approach to classify brain tumors. Pre-trained deep networks allow this method to yield top accuracy results
using scant training information for clinical applications. The authors Jabbar et al. [40] created a dual Caps-
VGGNet model which detects brain tumors alongside performing multi-grade segmentation. The network
design unifies capsule networks with CNN architectures in order to achieve better segmentation performance.
The authors of [41] introduce a machine learning detection model with deep feature concatenation and genetic
selection for detecting brain tumors. Feature selection proves its usefulness because it produces substantial
enhancements to classification accuracy levels. The research of Wang et al. describes how CNN-based learning
methods extract features from images for brain tumor detection. The research presents guidelines for
maximizing CNN architecture potential in medical image analytical applications. Preetha et al. [43] built an
automatic tumor detection solution through the application of EfficientNet-B4 model after fine-tuning.
According to their research findings the proposed method delivered superior accuracy results versus standard
CNN-based frameworks. Musallam et al. [44] created a specialized CNN design to detect brain tumors within
magnetic resonance imaging pictures specifically. The model works to eliminate erroneous positive results
while it enhances the classification results. Lata et al. [45] present research about deep learning applications for
brain tumor detection within privacy-preserving smart healthcare solutions. The research evaluates encryption
methods for data transfer with a focus on achieving accurate diagnosis results. The system uses Hossain et al.'s
[46] YOLOv3 deep neural network to detect tumors in their portable electromagnetic imaging system. YOLOv3
proves successful for the real-time recognition of brain tumors as presented through their method. Multi-class
brain tumor detection receives enhancement through Khushi et al. [47] by using a customized EfficientNet-B7
model. The proposed approach delivers enhanced performance that works well with various tumor cell types.
The hybrid machine learning and 3D-UNet segmentation feature combination enables brain tumor detection in
Mallampati et al.'s [48] research. Data processing techniques based on volume measurements substantially
enhance detection correctness according to the research findings. An automated segmentation system for brain
tumor MRI images functions through deep learning methods according to Rajendran et al. [49]. The model
effectively detects tumor areas to assist radiologists with diagnosis. The authors from Ejaz et al. [50] developed
hybrid segmentation which uses confidence region detection for precise tumor identification. The system
manages to achieve both high segmentation accuracy and quick computational operations. Jia and Chen [51]
conducted a study about deep learning approaches that classify tumors found in MRIs. This research
investigates CNN-based architectures for medical imaging because of their beneficial properties. The authors of
this research work present a multilayered detection system that uses different machine learning models for better
tumor identification results [52]. This system utilizes ensemble learning as part of its strategy to advance
classification outcomes. The authors Anaya-Isaza and Mera-Jiménez [53] employed data augmentation together
with transfer learning methods for brain tumor detection tasks. The authors showcase how model generalization
benefits from augmentation strategies when working with datasets with few examples. Roy et al. [54] introduced
S-Net alongside SA-Net for the purpose of brain tumor segmentation. The models successfully interpret tumor
structural information to enhance medical image segmentation quality in MRI examinations. The Swin
Transformer serves as a hierarchical vision transformer introduced by Liu et al. [55] which uses shifted window
methods for self-attention computation. Swin Transformer uses a novel design to extract contextual information
from both small and wide ranges efficiently which surpasses standard CNN applications in multiple visual tasks
such as classification and detection and segmentation activities. The Swin Transformer provides excellent
capabilities for fine-grained medical image analysis because it demonstrates both high flexibility in its
processing scale and durable feature representation abilities. The researchers from Single Shot MultiBox
Detector (SSD) under Liu et al. [56] developed a real-time object detection system which does not require
region proposal stages. The SSD method finds objects in pictures by conducting one network iteration and
implements multiple-sized feature layers to recognize objects of different dimensions. Embedded medical
systems together with real-time applications find computational speed and operational efficiency of this system
critical because of strict performance requirements. Tan et al. [57] created EfficientDet as an object detection
system which combines EfficientNet backbones with a new framework known as BiFPN (Bidirectional Feature
Pyramid Network). The state-of-the-art performance of EfficientDet comes alongside decreased parameters and
FLOPs when compared to competing professional detectors. The compound scaling method in EfficientDet
enables users to optimize resolution alongside depth and width parameters simultaneously thus providing an
optimal framework for medical image detection applications that require performance efficiency balance.
METHEDOLOGY
YOLOv11-EfficientNet
YOLOv11-EfficientNet represents the current variant of YOLO object detection series by integrating the
EfficientNet backbone into YOLOv11 (You Only Look Once version 11). The new version uses architectural
elements from earlier models to create performance improvements with efficient processing capabilities. The
model features three essential elements that consist of the C3k2 block and Spatial Pyramid Pooling-Fast (SPPF)
with Convolutional block with Parallel Spatial Attention (C2PSA). Combining these innovations allows the
model to extract vital visual features at high speeds which makes it perfectly adapted for real-time operation.
The advanced model YOLOv11-EfficientNet adds support for multiple functionalities including instance
segmentation and pose estimation and oriented object detection which enables it to serve as a broad computer
vision technology.
The YOLOv11-EfficientNet model uses the same YOLO architectural structure with backbone and neck and
head segments as its fundamental components. The variant incorporates EfficientNet as its backbone network to
achieve more accurate feature extraction with reduced parameter amounts. The enhancements made to each
model component improve operational performance alongside computational speed making it suitable for real-
time applications in computer vision.
The integrated YOLOv11_EfficientNet model features a specialized design for fast lightweight tumor detection
processes. The network consists of 307 layers through a combination of EfficientNet and YOLOv11's detection
head for advanced capabilities. This model design achieves 1.39 million parameters in conjunction with 1.39
million gradients (1,391,618) as it balances model complexity with training speed efficiently. The model
demonstrates high deployability for medical imaging tasks in real-time applications because it requires only 4.0
GFLOPs although its deep architecture. Feature extraction in EfficientNet functions accurately and minimally
while maintaining lightness because of its MBConv blocks and depthwise separable convolutions yet YOLOv11
achieves precise tumor localization with a reduction of processing overhead by means of its improved multi-
scale detection head. The combination enables highly efficient diagnostic accuracy while minimizing overfitting
risks on small medical databases as well as fast speed-up for training and inference procedures which makes the
solution practical for usage in clinical settings or embedded devices
Figure 1. Architecture diagram of the proposed model. The YOLOv11-EfficientNet framework maintains
traditional YOLO framework structure since it includes three key components: the backbone, neck, and head.
The model's design elements underwent enhancements to create a system that delivers high detection accuracy
in real-time operations.
Figure 1. Architecture diagram of Enhanced YOLOv11 Model
Backbone
In its core structure YOLOv11 uses an EfficientNet-like approach starting from a stem block containing 3×3
convolution layers andReLU6 activation. The stem accomplishes both spatial reduction and it also initialises
low-level feature extraction. A sequence of MBConv blocks follows the stem section where input channels and
output channels together with kernel size, stride and expansion ratio, dropout rate and optional SE mechanism
determine their characteristics. The MBConv blocks begin with lower channel dimensions before they increase
steadily which enables the model to study features of various hierarchical levels at different stages. Expansion
ratios function at a value of 6 to expand the channel width before the model returns operations to its original
output dimension for effective feature alteration. Within MBConv blocks depthwise convolutions perform
computation reduction tasks which support high performance operations. The SE modules work to upgrade
channel response capabilities by focusing on vital features in the system. The model applies stride 2 at particular
layers to perform downsampling operations alongside stride 1 applications for maintaing spatial resolution.
Neck
YOLOv11 utilizes its neck component to blend and refine network feature maps that originate from various
depth stages. The update that includes C3k2 block replaces the C2f block for more efficient feature fusion along
with faster processing capabilities. The system delivers quick calculations through this modification but
maintains both performance and informative quality from extracted features.
YOLOv11 gains the C2PSA module that purifies spatial knowledge allowing the model to detect priority
regions in the image. A high-quality feature processing function of the YOLOv11 allows precise detection
performance under difficult conditions including fluctuating illumination and intense clutter. Through its neck
design YOLOv11 processes multiple scales of features by fusing maps together and enlarging resolutions which
leads to improved understanding of object relationships for better detection in challenging environments.
Head
YOLOv11's head component obtains final object detection output through producing box coordinates and class
probability predictions. The system's accuracy alongside efficiency has received focused optimization in this
component. The C3k2 block functions widely throughout the head segment to improve the feature maps that
lead to ultimate prediction outcomes. This block proves equally effective despite having a more compact design
that saves computational parameters for faster execution during inference phase.
The feature extraction process receives improvements from CBS layers integrated into YOLOv11 plus these
layers maintain stability during training. The model becomes more robust through its data normalization process
in combination with features that ensure uninterrupted transitions between different layers. The Sigmoid Linear
Unit (SiLU) activation function enables stable gradient flow as a result of reducing training instabilities.
A cascade of convolutional layers serves to reach the necessary outputs required for object detection at the last
processing step. The Detect layer unifies predictions by establishing boundary coordinates in addition to
objectness scores and class probability results. YOLOv11 implements its well-designed structure to deliver fast
inference performances and precise detection results for multiple object classes.
Figure 1. describes architecture diagram of our model. SAM2 stands as a progressive foundation model that
addresses promptable visual segmentation within images alongside videos. The new SAM2 expands upon its
base SAM model through its use of innovation data processing methods alongside upgraded modeling structures
and a massive video-oriented training data collection. The SAM2 model introduces transformer-based
architecture that runs with streaming memory for speed processing of real-time video frame analysis at high
accuracy levels. The vast Segment Anything Video (SA-V) dataset has been used to train the model since it
contains 35.5 million masks spread across 50.9 thousand videos thus becoming the biggest video segmentation
dataset available today. The introduced enhancements enable SAM2 to provide superior segmentation outcomes
through fewer user engagements when compared to traditional methods.
Architecture of SAM2
SAM2 uses the SAM foundation to process video data through substantial redefinition of its original
architecture. The model analyzes video frames in sequence with help from a memory attention feature to
maintain continuous understanding through time. The system incorporates multiple functional units including
image encoder along with memory attention module and prompt encoder combined with mask decoder and
memory encoder that operates through a memory bank.
Image Encoder
The system uses the image encoder to extract features from every frame in the video. The hierarchical encoder
of SAM2 runs on a video-based framework which operates through Masked Autoencoders (MAE) and Hiera
during pre-training procedures. Feature embeddings emerge from the processing of each frame before being
used for segmentation operations through this efficient encoder.The streaming mechanism in SAM2 operates by
receiving video frames sequentially. Each video frame travels through the encoder until the encoder generates
important feature tokens which later support the segmentation steps. SAM2 operates effectively in real-time
because of its design making the system fit applications including robotics along with AR/VR technology and
automated vehicles.
The memory attention module of SAM2 functions as a distinctive feature that helps the system maintain
information from one frame to the next. SAM2 functions differently from conventional video segmentation
models since it keeps a memory storage unit which preserves important data about detected objects. Composed
of transformer blocks the memory attention mechanism applies self-attention on current frames together with
cross-attention between current frames and stored memory banks. SAM2 utilizes previous segmentations in its
predictions to make improvements over time which produces consistent results during occurrences of object
transformations including both physical deformation and complete blocking and movement blurring. The
integration of memory functionality enables SAM2 to achieve more precise segmentation results by utilizing
fewer user corrections.
The promptable segmentation functions of SAM are expanded by SAM2 through its support for interactive
segmentation using clicks and bounding boxes as well as masks. User inputs at the prompt encoder get
converted into feature embeddings to direct the segmentation operation.
The mask decoder implements its SAM adaptation to process both encoded prompts and image features with
memory-attended features for producing segmentation masks. Iterative processing takes place in the two-way
machine through a transformer design that avoids only updating prompt embeddings but also frame embeddings.
The iterative refinement process enables users to fix segmented output while preserving accuracy so they do not
need to prepare excessive manual annotations.
The memory encoder translates segmentation output data into minimal representations which serve the storage
needs before retrieval in future frames. The memory representations store critical object characteristics which
make SAM2 capable of retrieving previous segmentations to modify its future predictions effectively. The
memory bank functions through a FIFO queue algorithm where it stores segmentation outputs and preserves
object pointers bearing semantic information. SAM2 traces tracked objects through multiple video frames
because it stores memory representations of previous frames during processing. The design system operates
optimally for video editing applications and surveillance systems.
Dataset
This research uses Roboflow as a source for the dataset that was utilized in this exercise and that consists of
only 1,003 images that are systematically put into training set and validation set to enhance model learning and
evaluation [58]. In particular, the model obtains a wide variety of patterns and features from the 85% of the
dataset (852 images) which is reserved for the training, of it. Additionally, 15% of the dataset (151 images) is
placed aside for validation in order to evaluate performance on unseen data and assure generalization. So, to
keep model input being the same, we use stretch method to resize all images into a rectangular dimension of
640x640 pixels. This resizing ensures that the image size variations do not affect the model performance at all,
thus maintaining model for computational efficiency, without compromising the overall structural integrity of
tumor features.
Figure 2: Dataset
Result Analysis
The study applied Yolov11 pretrained models for brain tumor Detection, yielding impressive results. Compared
to prior Yolo Versions research, the optimized model demonstrated a significant improvement in both training
and testing accuracy, achieving a remarkable 87.4% Mean Precision Accuracy. Precision stood at 93.7% and
Recall stood at 75.3%, highlighting the model’s ability to reduce false positives, while the F1-score reached
83.5%, emphasizing its overall robustness. Detailed metrics further validated the model’s performance.
.
Figure 3. Confusion Matrix
Evaluating the performance of classification models depends upon the fundamental assessment tool called the
confusion matrix shown in the illustration. This grid contains two lines and two columns to evaluate predictions
against actual outcomes in order to identify model precision levels and detection mistakes. True Positive notifies
correct positive class predictions while False Positive represents Type I errors stemming from incorrect negative
to positive instances then False Negatives demonstrate Type II errors through positive cases misidentified as
negative and True Negative confirms proper negative class predictions. Multiple performance metrics including
precision, recall and F1-score are calculated by analyzing value sets that allow an assessment of the
effectiveness of classification models. The perfect model seeks to achieve maximum TP and TN values
simultaneously with reduced rates of FP and FN to ensure reliable and accurate predictions.
TP
Precision= ¿(1)
TP+ FP
TP
Recall= ¿ (2)
TP+ FN
TP TP
2× ×
TP+ FP TP+ FN
F 1= ¿(3)
TP TP
+
TP+ FP TP+ FN
N
1
mAP @ 50= ∑ APi ¿(4 )¿
N i=1
(a)
(b)
(c)
(d)
Figure 6. Performance Model (a) Precision-Recall Curve (b) P-Confidence Curve (c) F1- Confidence Curve(d)
R- Confidence Curve
The model is evaluated through four curves in Figure 5. Its (a) corresponds to the PR curve relation between the
precision and recall, which reflects how the model is performing on the precision while reaching the higher
recall. A usually well performing model usually has a high precision-recall area. (a) Precision-Confidence (P-
Confidence) curve presents the variation in precision with respect to higher confidence rates, it shows that
higher confidence values usually lead to higher precision in prediction. The F1-Confidence curve is the curve
depicting the resolution between precision and recall at various levels of confidence and serves as the
information to determine the best threshold to solve the working problem exhibiting the maximum overall
performance. Finally, (d) The Recall-Confidence (R-Confidence) curve depicts how recall depends on
confidence, where the trade-off between recall and confidence is caused when the model becomes more
selective. Combined, these curves give a complete evaluation of the model's capability of making good
predictions.
The confusion matrix in Figure 7 describes The evaluation of the model's classification performance in detecting
the position of brain tumor and detecting different categories. The classes in matrix are Brain-Tumor, Eye and
Background. The model correctly classified 145 brain tumor cases but erroneously classified 16 brain tumor
cases into other categories based on the results. In the same way, for the “Eye” category, the model got 19
correct and 1 incorrect. Out of 10 misclassified instances for Brain-Tumor and 4 for Eye, this further suggests
that there was some confusion regarding distinguishing between these 2 categories.
Figure 8. Performance Curve
Figure 8 illustrates the training and validation performance of the object detection model over multiple epochs.
The top row of subplots shows the training loss components: train/box_loss, train/cls_loss, and train/dfl_loss.
Each of these losses exhibits a smooth, downward trend, indicating effective learning and improved prediction
accuracy during training. Additionally, the training precision and recall both increase steadily, suggesting that
the model becomes more capable of correctly identifying and classifying objects while minimizing false
detections.
In the bottom row, the validation loss components val/box_loss, val/cls_loss, and val/dfl_loss also decrease over
time, reflecting good generalization to unseen data. Although a brief spike is observed in the validation
classification loss, it eventually stabilizes, which is typical during training. The mean average precision at 0.5
IoU and the stricter mean average precision averaged over IoUs from 0.5 to 0.95 show consistent improvement
across epochs. This indicates the model’s increasing capability to detect and localize objects with high accuracy.
The above charts presents performance comparison results for four object detection systems namely YOLOv11-
EfficientNet+SAM2, Swin Transformer, SSD, and EfficientDet through 25 training epochs across precision,
recall, and accuracy (mAP50). YOLOv11-EfficientNet+SAM2 demonstrates superior performance among all
models by attaining precision of 93.7% and recall of 75.3% which leads to an mAP50 score of 87.4% and
resulting F1 score of 83.5%. The detection system achieves outstanding results in identifying objects precisely
in multiple operational conditions. The Swin Transformer achieves the status of second-place performer
delivering 81.2% precision and 64.1% recall and 73.2% mAP50 with an F1 score of 71.7% indicating effective
performance across multiple scenarios. The SSD platform maintains its popular speed capabilities while
achieving 71.3% precision, 58.2% recall and 65.7% mAP50 value but works best for real-time applications
rather than complex detection requirements. The detection performance of EfficientDet remains restricted by its
computational efficiency because it demonstrates 64.5% precision, 49.8% recall, 59.3% mAP50 and 56.1% F1
score thereby limiting its suitability in precise detection settings. The combination of EfficientNet with SAM2 in
YOLOv11 produces an improved model which achieves superior results in every performance evaluation. When
YOLOv11 uses EfficientNet as its base it achieves superior accuracy results and introduces significant
enhancements in processing speed alongside operational efficiency. The model convulses through image
processing with a total time span of 0.2 milliseconds preprocessing and 2.9 milliseconds inference followed by
2.0 milliseconds postprocessing and minimal loss computation (0.0 ms). The complete image processing flow of
the model executes in under 6 milliseconds which proves suitable for real-time tumor detection in MR-based
medical workflows. The system’s efficiency enables it to analyze vast data volumes at high speed which
facilitates quick diagnostic decisions and automated radiology tool inclusion.
Conclusion
Brain cell tumors arising from uncontrolled cell growth require prompt detection methods because they
represent a medical emergency. Traditional deep learning tools used for brain tumor classification show both
accuracy and efficiency problems that require more effective solutions. The system resolves previous detection
system problems through the combination of YOLOv11-EfficientNet with Segment Anything Model 2 (SAM2)
to create a comprehensive detection method. The YOLOv11-EfficientNet model delivered impressive results
with a precision value of 0.937 and recall of 0.753 while reaching an mAP@50 value of 0.874 which proves its
capability as a reliable brain tumor detection system. The diagnostic performance of the model elevated with
SAM2 since this model produced precise tumor boundary segmentation which is crucial for medical evaluation
procedures. During dual tasks of detection and segmentation execution the model operates at 6.3 GFLOPs
which ensures instant processing capability. The system delivers medical benefits that accelerate radiologists
towards making more certain diagnoses. Certain practical issues need addressing in this model including
maintaining effective detection of multiple tumors inside one MRI scan as well as the capability to adapt when
faced with different imaging settings. Future research directions will first work toward enhancing multi-tumor
recognition while developing specialized dataset subsets alongside adding multimodal imaging such as MRI
together with CT scans which would improve the predictive model's predictive power and medical usefulness
for clinical applications.
References
[1] Vargo M. Brain tumor rehabilitation. American Journal of Physical Medicine & Rehabilitation 2011
May;90(5 Suppl 1):S50-62. doi: 10.1097/PHM.0b013e31820be31f. PMID: 21765264.
[2] Kurt P. Schellhas, Clyde H. Wilkes. Temporomandibular Joint Inflammation: Comparison of MR Fast
Scanning with T1- and T2-Weighted Imaging Techniques
[3] Cox DD, Savoy RL. Functional magnetic resonance imaging (fMRI) "brain reading": detecting and
classifying distributed patterns of fMRI activity in human visual cortex. Neuroimage. 2003 Jun;19(2 Pt 1):261-
70. doi: 10.1016/s1053-8119(03)00049-1. PMID: 12814577.
[4] Jones, D.K., Leemans, A. (2011). Diffusion Tensor Imaging. In: Modo, M., Bulte, J. (eds) Magnetic
Resonance Neuroimaging. Methods in Molecular Biology, vol 711. Humana Press. https://doi.org/10.1007/978-
1-61737-992-5_6
[5] Buonocore MH, Maddock RJ. Magnetic resonance spectroscopy of the brain: a review of physical principles
and technical methods. Rev Neurosci. 2015;26(6):609-32. doi: 10.1515/revneuro-2015-0010. PMID: 26200810.
[6] Ehman, R.L. Magnetic resonance elastography: from invention to standard of care. Abdom Radiol 47, 3028–
3036 (2022). https://doi.org/10.1007/s00261-022-03597-z
[7] F. Ritter et al., "Medical Image Analysis," in IEEE Pulse, vol. 2, no. 6, pp. 60-70, Nov.-Dec. 2011, doi:
10.1109/MPUL.2011.942929.
[8] Manjón, J.V. (2017). MRI Preprocessing. In: Martí-Bonmatí, L., Alberich-Bayarri, A. (eds) Imaging
Biomarkers. Springer, Cham. https://doi.org/10.1007/978-3-319-43504-6_5
[9] Shorten, C., Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J Big Data 6, 60
(2019). https://doi.org/10.1186/s40537-019-0197-0
[10] Z. Li, F. Liu, W. Yang, S. Peng and J. Zhou, "A Survey of Convolutional Neural Networks: Analysis,
Applications, and Prospects," in IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12,
pp. 6999-7019, Dec. 2022, doi: 10.1109/TNNLS.2021.3084827.
[11] T. Hossain, F. S. Shishir, M. Ashraf, M. A. Al Nasim and F. Muhammad Shah, "Brain Tumor Detection
Using Convolutional Neural Network," 2019 1st International Conference on Advances in Science, Engineering
and Robotics Technology (ICASERT), Dhaka, Bangladesh, 2019, pp. 1-6, doi:
10.1109/ICASERT.2019.8934561.
[12] Arunkumar, N., Mohammed, M.A., Abd Ghani, M.K. et al. K-Means clustering and neural network for
object detecting and identifying abnormality of brain tumor. Soft Comput 23, 9083–9096 (2019).
https://doi.org/10.1007/s00500-018-3618-7
[13] Alam MS, Rahman MM, Hossain MA, Islam MK, Ahmed KM, Ahmed KT, Singh BC, Miah MS.
Automatic Human Brain Tumor Detection in MRI Image Using Template-Based K Means and Improved Fuzzy
C Means Clustering Algorithm. Big Data and Cognitive Computing. 2019; 3(2):27.
https://doi.org/10.3390/bdcc3020027
[14] K. M. Priya, S. Kavitha and B. Bharathi, "Brain tumor types and grades classification based on statistical
feature set using support vector machine," 2016 10th International Conference on Intelligent Systems and
Control (ISCO), Coimbatore, India, 2016, pp. 1-8, doi: 10.1109/ISCO.2016.7726910.
[15] Florimbi G, Fabelo H, Torti E, Lazcano R, Madroñal D, Ortega S, Salvador R, Leporati F, Danese G, Báez-
Quevedo A, et al. Accelerating the K-Nearest Neighbors Filtering Algorithm to Optimize the Real-Time
Classification of Human Brain Tumor in Hyperspectral Images. Sensors. 2018; 18(7):2314.
https://doi.org/10.3390/s18072314
[16] Lefkovits, L., Lefkovits, S., Szilágyi, L. (2016). Brain Tumor Segmentation with Optimized Random
Forest. In: Crimi, A., Menze, B., Maier, O., Reyes, M., Winzeck, S., Handels, H. (eds) Brainlesion: Glioma,
Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2016. Lecture Notes in Computer Science(),
vol 10154. Springer, Cham. https://doi.org/10.1007/978-3-319-55524-9_9
[17] Shashank Reddy, D., Naga Harshitha, C., & Mary Belinda, C. (2018). Brain tumor prediction using naïve
Bayes’ classifier and decision tree algorithms. International Journal of Engineering and Technology, 7(1.7),
137-141. https://doi.org/10.14419/ijet.v7i1.7.10634
[18] R. Ezhilarasi and P. Varalakshmi, "Tumor Detection in the Brain using Faster R-CNN," 2018 2nd
International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)I-SMAC (IoT in
Social, Mobile, Analytics and Cloud) (I-SMAC), 2018 2nd International Conference on, Palladam, India, 2018,
pp. 388-392, doi: 10.1109/I-SMAC.2018.8653705.
[19] S. Sing, A Novel Mask R-CNN Model to Segment Heterogeneous Brain Tumors through Image
Subtraction. Electrical Engineering and Systems Science. Image and Video Processing (eess.IV); Computer
Vision and Pattern Recognition (cs.CV) arXiv:2204.01201v1 (eess). https://doi.org/10.48550/arXiv.2204.01201
[20] Hikmah, N., Hajjanto, A., A. Surbakti, A., Prakosa, N., Asmaria, T., & Sardjono, T. (2024). Brain tumor
detection using a MobileNetV2-SSD model with modified feature pyramid network levels. International Journal
of Electrical and Computer Engineering (IJECE), 14(4), 3995-4004.
doi: http://doi.org/10.11591/ijece.v14i4.pp3995-4004.
[21] Kang, M., Ting, CM., Ting, F.F., Phan, R.CW. (2023). RCS-YOLO: A Fast and High-Accuracy Object
Detector for Brain Tumor Detection. In: Greenspan, H., et al. Medical Image Computing and Computer
Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14223.
Springer, Cham. https://doi.org/10.1007/978-3-031-43901-8_57
[24] N. Noreen, S. Palaniappan, A. Qayyum, I. Ahmad, M. Imran and M. Shoaib, "A Deep Learning Model
Based on Concatenation Approach for the Diagnosis of Brain Tumor," in IEEE Access, vol. 8, pp. 55135-55144,
2020, doi: 10.1109/ACCESS.2020.2978629.
[25] P. Chauhan et al., "PBVit: A Patch-Based Vision Transformer for Enhanced Brain Tumor Detection,"
in IEEE Access, vol. 13, pp. 13015-13029, 2025, doi: 10.1109/ACCESS.2024.3521002.
[26] A. A. Asiri, T. A. Soomro, A. A. Shah, G. Pogrebna, M. Irfan and S. Alqahtani, "Optimized Brain Tumor
Detection: A Dual-Module Approach for MRI Image Enhancement and Tumor Classification," in IEEE Access,
vol. 12, pp. 42868-42887, 2024, doi: 10.1109/ACCESS.2024.3379136
[27] H. A. Shah, F. Saeed, S. Yun, J. -H. Park, A. Paul and J. -M. Kang, "A Robust Approach for Brain Tumor
Detection in Magnetic Resonance Images Using Finetuned EfficientNet," in IEEE Access, vol. 10, pp. 65426-
65438, 2022, doi: 10.1109/ACCESS.2022.3184113
[28] K. Neamah et al., "Brain Tumor Classification and Detection Based DL Models: A Systematic Review,"
in IEEE Access, vol. 12, pp. 2517-2542, 2024, doi: 10.1109/ACCESS.2023.3347545.
[29] M. S. Majib, M. M. Rahman, T. M. S. Sazzad, N. I. Khan and S. K. Dey, "VGG-SCNet: A VGG Net-Based
Deep Learning Framework for Brain Tumor Detection on MRI Images," in IEEE Access, vol. 9, pp. 116942-
116952, 2021, doi: 10.1109/ACCESS.2021.3105874.
[30] L. Annet Abraham, G. Palanisamy and V. Goutham, "Dilated Convolution and YOLOv8 Feature
Extraction Network: An Improved Method for MRI-Based Brain Tumor Detection," in IEEE Access, vol. 13,
pp. 27238-27256, 2025, doi: 10.1109/ACCESS.2025.3539924.
[31] S. Solanki, U. P. Singh, S. S. Chouhan and S. Jain, "Brain Tumor Detection and Classification Using
Intelligence Techniques: An Overview," in IEEE Access, vol. 11, pp. 12870-12886, 2023, doi:
10.1109/ACCESS.2023.3242666
[32] M. F. Almufareh, M. Imran, A. Khan, M. Humayun and M. Asim, "Automated Brain Tumor Segmentation
and Classification in MRI Using YOLO-Based Deep Learning," in IEEE Access, vol. 12, pp. 16189-16207,
2024, doi: 10.1109/ACCESS.2024.3359418.
[33] A. Younis et al., "Abnormal Brain Tumors Classification Using ResNet50 and Its Comprehensive
Evaluation," in IEEE Access, vol. 12, pp. 78843-78853, 2024, doi: 10.1109/ACCESS.2024.3403902.
[34] S. Ahmad and P. K. Choudhury, "On the Performance of Deep Transfer Learning Networks for Brain
Tumor Detection Using MR Images," in IEEE Access, vol. 10, pp. 59099-59114, 2022, doi:
10.1109/ACCESS.2022.3179376.
[35] S. Asif, W. Yi, Q. U. Ain, J. Hou, T. Yi and J. Si, "Improving Effectiveness of Different Deep Transfer
Learning-Based Models for Detecting Brain Tumors From MR Images," in IEEE Access, vol. 10, pp. 34716-
34730, 2022, doi: 10.1109/ACCESS.2022.3153306.
[36] A. H. Abdel-Gawad, L. A. Said and A. G. Radwan, "Optimized Edge Detection Technique for Brain
Tumor Detection in MR Images," in IEEE Access, vol. 8, pp. 136243-136259, 2020, doi:
10.1109/ACCESS.2020.3009898
[37] A. Farzamnia, S. H. Hazaveh, S. S. Siadat and E. G. Moung, "MRI Brain Tumor Detection Methods Using
Contourlet Transform Based on Time Adaptive Self-Organizing Map," in IEEE Access, vol. 11, pp. 113480-
113492, 2023, doi: 10.1109/ACCESS.2023.3322450.
[38] Q. Yao, D. Zhuang, Y. Feng, Y. Wang and J. Liu, "Accurate Detection of Brain Tumor Lesions From
Medical Images Based on Improved YOLOv8 Algorithm," in IEEE Access, vol. 12, pp. 144260-144279, 2024,
doi: 10.1109/ACCESS.2024.3472039
[39] N. Bibi et al., "A Transfer Learning-Based Approach for Brain Tumor Classification," in IEEE Access, vol.
12, pp. 111218-111238, 2024, doi: 10.1109/ACCESS.2024.3425469.
[40] A. Jabbar, S. Naseem, T. Mahmood, T. Saba, F. S. Alamri and A. Rehman, "Brain Tumor Detection and
Multi-Grade Segmentation Through Hybrid Caps-VGGNet Model," in IEEE Access, vol. 11, pp. 72518-72536,
2023, doi: 10.1109/ACCESS.2023.3289224.
[41] M. Wageh, K. Amin, A. D. Algarni, A. M. Hamad and M. Ibrahim, "Brain Tumor Detection Based on
Deep Features Concatenation and Machine Learning Classifiers With Genetic Selection," in IEEE Access, vol.
12, pp. 114923-114939, 2024, doi: 10.1109/ACCESS.2024.3446190
[42] W. Wang, F. Bu, Z. Lin and S. Zhai, "Learning Methods of Convolutional Neural Network Combined With
Image Feature Extraction in Brain Tumor Detection," in IEEE Access, vol. 8, pp. 152659-152668, 2020, doi:
10.1109/ACCESS.2020.3016282.
[43] R. Preetha, M. J. P. Priyadarsini and J. S. Nisha, "Automated Brain Tumor Detection From Magnetic
Resonance Images Using Fine-Tuned EfficientNet-B4 Convolutional Neural Network," in IEEE Access, vol. 12,
pp. 112181-112195, 2024, doi: 10.1109/ACCESS.2024.3442979.
[44] A. S. Musallam, A. S. Sherif and M. K. Hussein, "A New Convolutional Neural Network Architecture for
Automatic Detection of Brain Tumors in Magnetic Resonance Imaging Images," in IEEE Access, vol. 10, pp.
2775-2782, 2022, doi: 10.1109/ACCESS.2022.3140289.
[45] K. Lata, P. Singh, S. Saini and L. R. Cenkeramaddi, "Deep Learning-Based Brain Tumor Detection in
Privacy-Preserving Smart Health Care Systems," in IEEE Access, vol. 12, pp. 140722-140733, 2024, doi:
10.1109/ACCESS.2024.3456599.
[46] A. Hossain et al., "A YOLOv3 Deep Neural Network Model to Detect Brain Tumor in Portable
Electromagnetic Imaging System," in IEEE Access, vol. 9, pp. 82647-82660, 2021, doi:
10.1109/ACCESS.2021.3086624.
[47] H. M. T. Khushi, T. Masood, A. Jaffar, M. Rashid and S. Akram, "Improved Multiclass Brain Tumor
Detection via Customized Pretrained EfficientNetB7 Model," in IEEE Access, vol. 11, pp. 117210-117230,
2023, doi: 10.1109/ACCESS.2023.3325883
[48] B. Mallampati, A. Ishaq, F. Rustam, V. Kuthala, S. Alfarhood and I. Ashraf, "Brain Tumor Detection Using
3D-UNet Segmentation Features and Hybrid Machine Learning Model," in IEEE Access, vol. 11, pp. 135020-
135034, 2023, doi: 10.1109/ACCESS.2023.3337363.
[49] S. Rajendran et al., "Automated Segmentation of Brain Tumor MRI Images Using Deep Learning,"
in IEEE Access, vol. 11, pp. 64758-64768, 2023, doi: 10.1109/ACCESS.2023.3288017.
[50] K. Ejaz, M. S. M. Rahim, U. I. Bajwa, H. Chaudhry, A. Rehman and F. Ejaz, "Hybrid Segmentation
Method with Confidence Region Detection for Tumor Identification," in IEEE Access, vol. 9, pp. 35256-35278,
2021, doi: 10.1109/ACCESS.2020.3016627.
[51] Z. Jia and D. Chen, "Brain Tumor Identification and Classification of MRI images using deep learning
techniques," in IEEE Access, doi: 10.1109/ACCESS.2020.3016319.
[52] R. Agarwal, S. D. Pande, S. N. Mohanty and S. K. Panda, "A Novel Hybrid System of Detecting Brain
Tumors in MRI," in IEEE Access, vol. 11, pp. 118372-118385, 2023, doi: 10.1109/ACCESS.2023.3326447
[53] A. Anaya-Isaza and L. Mera-Jiménez, "Data Augmentation and Transfer Learning for Brain Tumor
Detection in Magnetic Resonance Imaging," in IEEE Access, vol. 10, pp. 23217-23233, 2022, doi:
10.1109/ACCESS.2022.3154061.
[54] S. Roy, R. Saha, S. Sarkar, R. Mehera, R. K. Pal and S. K. Bandyopadhyay, "Brain Tumour Segmentation
Using S-Net and SA-Net," in IEEE Access, vol. 11, pp. 28658-28679, 2023, doi:
10.1109/ACCESS.2023.3257722
[55] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin and B. Guo, "Swin Transformer: Hierarchical
Vision Transformer using Shifted Windows," in Proc. IEEE/CVF International Conference on Computer Vision
(ICCV), 2021, pp. 10012–10022, doi: 10.1109/ICCV48922.2021.00989
[56] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu and A. C. Berg, "SSD: Single Shot MultiBox
Detector," in Proc. European Conference on Computer Vision (ECCV), 2016, pp. 21–37, doi: 10.1007/978-3-
319-46448-0_2
[57] M. Tan, R. Pang and Q. V. Le, "EfficientDet: Scalable and Efficient Object Detection," in Proc. IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp.10781–10790, doi:
10.1109/CVPR42600.2020.01080
[58] Dwyer, B., Nelson, J., Hansen, T., et. al. (2024). Roboflow (Version 1.0) [Software]. Available from
https://roboflow.com. computer vision.