Machine-Learning Methods in Detecting Breast Cancer and Related Therapeutic Issues A Review

Computer Methods in Biomechanics and Biomedical

Engineering: Imaging & Visualization

Machine-learning methods in detecting breast

cancer and related therapeutic issues: a review

Ali Jafari

2024, VOL. 12, NO. 1, 2299093

Machine-learning methods in detecting breast cancer and related therapeutic issues:

a review
Ali Jafari
Department of Computer Science and Statistics, K.N.Toosi University of Technology, Tehran, Tehran Province, Iran


In 2020, the World Health Organization reported that breast cancer resulted in the deaths of 685,000 Received 23 May 2023
common cancer globally, with 7.8 million women diagnosed in the past five years. Machine learning Accepted 20 December 2023
KEYWORDS
(ML) techniques can help identify breast cancer early and define its type by analyzing tumor size. ML Breast cancer; machine
models have been used for image classification and cancer prediction, and have been shown to be learning; cancer detection;
beneficial for breast cancer diagnosis. The current systematic review aims to highlight the gaps and therapeutic methods;
shortcomings of previous works regarding the use of ML for breast cancer prediction based on image artificial intelligence
shortcomings of previous works regarding the use of ML for breast cancer prediction based on image artificial intelligence
processing. The review updates publications to see the pros and cons of various ML and deep learning
(DL) techniques, and can benefit medical practitioners seeking advanced therapies. The previous works
mainly benefited from SVM, KNN, and DT in detecting BC; however, other techniques, especially the DL
ones, can be useful.

1. Introduction providers. ML has the potential to be used not only for dose
determination but also for identifying the most suitable drug
In the modern age of algorithms, machine learning (ML) (Chen for individual patients. Statistical models that use the diversity
et al. 2021) and deep learning (DL) (Chen and Jain 2020) tools and complexity of EHR-derived data remain comparatively
have revolutionised multiple industries, especially manufactur­ uncommon and present an attractive field of study (Callahan
ing, transportation, and government. In recent years, DL has and Shah 2017). A system that entails improving medical ser­
delivered cutting-edge performance in various fields (Kaul et al. vices to meet peoples’ medical needs falls under the broad
2022), like speech processing, text analytics, and computer healthcare category. Patients, doctors, vendors, health organi­
vision. The widespread use of ML/DL algorithms across many sations, and IT firms all work to protect and restore patient
industries (including social media) makes these technologies records in the healthcare industry. Indian healthcare has been
indispensable to daily life (Mittal and Hasija 2020). ML/DL algo­ one of the world’s fastest-growing industries for the past 10
rithms are also beginning to impact healthcare, a historically years (Sarwal et al. 2021). ML used in healthcare analyses can
resistant sector to significant technology disruptions (Latif et al. diagnose various illnesses, including cancer, diabetes, strokes,
2017). For example, the recognition of the internal organs from and other conditions. Lung cancer, BC, prostate cancer, sto­
medical scans, the recognition of interstitial lung illnesses, the mach cancer, and various other kinds of cancer can be diag­
detection of lung nodules, the reconstruction of diagnostic nosed using ML based on image processing. Each year, 12% of
images (Yan et al. 2016), and the classification of brain tumours lung cancer diagnoses and 10% of deaths are documented.
(Anthimopoulos et al. 2016) can be conducted using the ML/DL Similar statistics apply to BC, where 11% of incidences result
algorithms have in the medical field (Havaei et al. 2017). in 9% of fatalities. This occurs in all cancer types. It is necessary
Over recent years, notable advancements have been seen in to produce accurate and high-quality data to analyse cancer in
using ML across diverse sectors and scholarly investigations the healthcare system. Healthcare must employ data in a world
(Bhardwaj et al. 2017). ML applied to electronic health records of competition to boost healthcare quality and reduce the costs
(EHRs) may provide useful insights that can be used to stream­ associated with treatment (Dhillon and Singh 2019).
line hospital operations, enhance patient risk score systems, BC is a type of cancer that forms in the breast when cells
and anticipate the development of disease (Shickel et al. grow out of control. BC cells usually create a tumour seen on
2017). ML algorithms have significant value in healthcare X-rays or felt as a lump. It is imperative to understand that most
because they can effectively analyse vast quantities of daily breast lumps are benign and not cancerous or malignant
healthcare data via electronic health records (Usmani and (Sharma et al. 2010). Non-cancerous breast tumours grow
Jaafar 2022). Additionally, using ML techniques may facilitate abnormally but do not spread outside the breast. They are
the identification of optimal pharmaceutical dosages, reducing not life-threatening, but some benign breast lumps increase
healthcare expenses for both patients and healthcare the risk of BC in women (Waks and Winer 2019). It should be

mentioned that setting up preventative initiatives at the pri­ the remarkable interest of scientists in dealing with BC using
mary level is too rigid. Therefore, systematic and thorough the various ML techniques, many gaps and shortcomings still
screening programmes that accurately and promptly identify need to be tackled as soon as possible. Hence, the current
the condition can reduce the significant resulting complica­ review aims to examine the limitations of the presented ML
tions and deaths. Mammography, thermography, ultrasonogra­ methods to open a new way for more research. The ML meth­
phy, and breast biopsy are frequently used screening ods’ accuracy and speed for detecting BC are investigated here
techniques. However, many communities, particularly in devel­ as an innovation. The highest level of accuracy, amounting to
oping nations, lack access to the aforementioned solutions 78.6%, was attained through the utilisation of an MLP classifier
since they are frequently too expensive and difficult to imple­ to identify breast cancer. This classification task was performed
ment. As a result, BC incidence and death have increased to using the Wisconsin Prognosis Breast Cancer dataset (WPBC),
create communities. Additionally, due to time-oriented and initially made available in 1970. The study conducted by Yue
cumulative data scarcity, errors in case identification are et al (Yue et al. 2018). Technologies utilised in the healthcare
unavoidable. sector encompass the management and retrieval of electronic
The predominant manifestation of BC often presents as the medical records pertaining to patients and the instruments
emergence of a novel lump or mass; however, it is important to employed in the process. The identification of cancer has con­
note that most breast lumps are non-cancerous (Graydon et al. sistently been a significant problem in the realm of diagnosing
1997). A painless, firm lesion with irregular margins indicates and developing treatment strategies for haematological ill­
malignancy; nevertheless, breast carcinomas may manifest as nesses. An overwhelming percentage of the population is
soft, rounded, sensitive, or even painful masses. The use of ML affected by one or more diseases. In recent years, there have
in image processing expedites the diagnosis process, and after been significant advancements in medical science.
that, the tests can confirm the initial diagnoses. Currently, the Notwithstanding these developments, a substantial knowledge
breast imaging modalities most often used are mammography, deficit persists among the general population regarding health
ultrasound, and magnetic resonance imaging (MRI) of the and illness. A significant segment of the populace likely experi­
breast. Additional diagnostic procedures, such as computed ences health ailments, potentially including those of a life-
tomography (CT) scans, bone scans, or positron emission tomo­ threatening kind (Kamboj et al. 2021). Accordingly, the issue
graphy (PET) scans, may be used on occasion to facilitate the of accuracy and precision in detection has always been
determination of metastasis in cases of breast cancer (Dye et al. a striking topic that has yet to be comprehensively covered.
2012). There are different tests to diagnose BC. If the doctor As seen from the literature, this is the only review that has
finds signs of cancer during the test, or if you see signs that examined the challenges and benefits of the various ML tech­
indicate the possibility of this type of cancer, more tests are niques in terms of accuracy and speed for BC prediction.
needed to be sure of this issue. Tests such as mammograms, The primary purpose of the work is to examine the limita­
breast ultrasound or breast ultrasounds, and breast MRIs are tions and strengths of the previous ML methods in detecting
mainly considered for detection purposes. Having regular breast cancer. Hence, the suggestions obtained from the gaps
check-ups is the best way to deal with this cancer. However, in the previous works are given. The advent of ML has pre­
computer-aided diagnosis (CAD) can lead to better results as sented a promising prospect in the battle against breast cancer.
they reduce trial costs and have better accuracy. Machine ML approaches have exhibited remarkable potential in the
learning is a subfield of artificial intelligence that combines timely detection and categorisation of different breast cancer
a range of statistical, probabilistic, and optimisation techniques types through the analysis of tumour sizes. ML has demon­
to enable computers to ‘learn’ from previous examples and find strated significant use in the field of image classification, allow­
difficult-to-detect patterns in vast, noisy, or complex data sets. ing precise prognostications for a range of cancer types, such as
Consequently, machine learning is increasingly being breast cancer. Although breast cancer is primarily observed in
employed in cancer detection and diagnosis (Osareh and women, it is crucial to acknowledge that this condition can also
Shadgar 2010). Solid evidence supports the efficacy of conven­ afflict men. The importance of new technology in properly
tional laboratory methods like CT and MRI (Tabl et al. 2019). addressing breast cancer is evident when considering the
Nevertheless, they fail to disclose anything regarding the anticipation of a specific death rate for this disease in 2025.
mechanism behind cancer development. However, develop­ To address this pressing matter, a thorough systematic evalua­
ments in DNA microarray technology have made it possible tion is undertaken, specifically focusing on the numerous
to collect large numbers of specimens of gene expression. machine learning models employed in diagnosing breast can­
Treatment for breast cancer and survival rates can be used as cer. The second purpose of this review is to evaluate the
training data for ML models. Hence, the ML models can be accuracy, precision, and recall of several machine learning
widely employed in medical fields and health assessment. models used in breast cancer prediction using image proces­
After analysing the input variables used to predict BC, the sing techniques. The third aim of the review is to examine
ML models can be considered for BC prediction using ML recent papers, carefully analysing the advantages and disad­
techniques. For this purpose, ML algorithms such as Naive vantages of various ML and DL approaches. The knowledge
Bayes (NB), Bayesian network (BNeT), random forest (RF), multi­ acquired from this extensive assessment can significantly trans­
layer perceptron (MLP), SVM, eXtreme Gradient Boosting form the approaches of healthcare professionals in their pursuit
(XGBoost), and C4.5 decision tree can be implemented before. of innovative treatments for breast cancer. Prior studies pre­
Ensemble learning is an effective technique for prediction dominantly utilised support vector machines (SVM), k-nearest
enhancement and overfitting possibility reduction. Despite neighbours (KNN), and decision trees (DT) as primary methods

Figure 1. The growing trend for the applications of ML in the medical field.

for breast cancer diagnosis. However, this review emphasises continuous pain, and variations in the size, colour (redness),
the need to investigate and incorporate other strategies asso­ and skin texture of the breasts. Pathologists use BC classifica­
ciated with deep learning methodologies. The use of these tion to find a systematic and objective prognosis; the most
sophisticated methodologies has the potential to significantly common classification is binary (benign cancer/malignant can­
enhance the precision and effectiveness of breast cancer pre­ cer). ML techniques are now widely used in the classification
diction models. In summary, this study has the potential to of BC. They offer great classification accuracy as well as power­
facilitate significant progress in breast cancer detection and ful diagnostic capabilities. In another study, two distinct classi­
prognosis, providing a framework for incorporating state-of- fiers were proposed for BC classification: the Naive Bayes (NB)
the-art ML and DL methodologies. This research seeks to classifier and the nearest neighbour (KNN) classifier (Amrane
further the medical community’s efforts in addressing breast et al. 2018). A comparison of the two new solutions and the use
cancer by identifying and discussing the shortcomings of prior of cross-validation were made to assess their accuracy. The
methodology. The intention is to encourage the adoption of outcomes indicated that KNN has the highest efficiency
more rigorous, precise, and effective procedures. (97.51%) with the lowest error rate, followed by the NB classifier
The rest of the paper is organised as follows: The second (96.19%). In another research, the support vector machine
section briefly reviews the related works and gives essential (SVM), KNN, random forests, artificial neural networks (ANNs),
information. The role of ML in detecting BC is studied in the and logistic regression supervised machine learning algorithms
third section. The fourth section examines the various abilities were presented (Islam et al. 2020). The Wisconsin Breast Cancer
of the ML method based on the related works. Finally, the fifth dataset was taken from the UCI machine learning database,
section draws on the significant findings and suggestions for a well-known machine learning resource. The reliability, speci­
future works. ficity, sensitivity, precision, adverse predictive value, false-
negative rate, false-positive rate, F1 score, and Matthews
Correlation Coefficient are used to assess the study’s perfor­
2. Methodology
mance. Furthermore, the precision-recall area under curve and
The use of ML techniques in medical applications is one of the receiver operating characteristic curve of various approaches
significant achievements of the technology these days. The were evaluated. The results showed that ANNs have the great­
computer-aided techniques are proposed to tackle the pro­ est precision, accuracy, and F1 scores of 98.57%, 97.82%, and
blems and shortcomings of traditional methods. As shown in 0.9890, respectively, while SVM had the best accuracy, preci­
Figure 1, the term ‘machine learning in medical application’ has sion, and F1 scores of 97.14%, 95.65%, and 0.9777, respectively.
been widely used in the literature. In 2021, J. Wu and C. Hicks proposed using gene expression
The review conducted here is systematic and particularly data to classify patients with triple-negative breast cancer and
focuses on the applications of ML in the treatment of BC. non-triple-negative breast cancer using an ML approach (Wu
Accordingly, about 120 were extracted from the main search and Hicks 2021). To identify the features (genes) used in the
engines, such as Google Scholar and Science Direct, in the construction and validation of the classification models, RNA-
beginning. After categorising the related papers based on the Sequence data was analysed from 110 triple negative and 992
purposes and methods, about 76 papers were cited in this non-triple negative BC tumour samples from the Cancer
review. The primary sources of paper extraction were Springer Genome Atlas. We tested four classification models for categor­
and Elsevier. The review outlines the major gaps and short­ ising the two forms of breast cancer, namely SVMs, KNN, NB,
comings that can be starting points for future works. and Decision Tree (DT), using characteristics picked at varying
threshold levels to train the models. The proposed approaches
were utilised on independent gene expression datasets to
3. Literature review
assess effectiveness and validation. The SVM method cate­
Many existing studies in the broader literature have examined gorised breast cancer more precisely into triple minus and non-
the effectiveness of ML methods in BC detection (Gayathri et al. triple opposite breast cancer and exhibited fewer misclassifica­
2013; Alarabeyyat and Alhanahnah ; Bazazeh and Shubair). BC tion errors than all three tested algorithms. The prediction
affects 8% of women over their lifetime; behind lung cancer, it results demonstrated that ML algorithms are practical for clas­
is the second leading cause of fatalities in both the developed sifying breast cancer into triple negative and non-triple nega­
and developing worlds. BC is distinguished by gene mutation, tive categories. In 2019, Ganggayah et al. (Ganggayah et al.

2019) employed various ML techniques to create models for data classification in terms of the efficiency and effectiveness of
detecting and visualising relevant BC survival rate prognostic each method in terms of accuracy, precision, sensitivity, and
factors. specificity. The experimental findings demonstrated that SVM
Early disease diagnosis has been a critical challenge due to has the most excellent accuracy (97.13%) and the lowest error
the recent population expansion in medical studies (Islam et al. rate. All experiments were conducted in a simulated environ­
2020). With the tremendous population growth, the danger of ment using the WEKA data mining tool. The Nottingham histo­
death caused by breast cancer has increased rapidly. In recent logical grade (NHG) for breast cancer is a well-established
years, many advances have been made to ML techniques, the predictive indicator widely utilised in clinical decision-making.
primary medical tool (Allegra et al. 2012; Cardoso et al. 2016; 50% of patients are classed as grade 2, an intermediate-risk
Abou Tabl et al.). In 2019, Tabl et al. (Tabl et al. 2019) proposed group with little clinical value. In another study, a novel histo­
a novel technique for detecting the symptoms of BC. logical grade model (DeepGrade) was developed using digital
Accordingly, the authors conducted clinical operations for 347 whole-slide histopathology images (WSIs) and deep learning to
patients, considering the combination of feature selection tech­ improve risk stratification in NHG 2 breast cancer patients
niques and a prediction method. The findings revealed that the (Wang et al. 2022). More studies are also summarised in
proposed model can successfully detect the classes with high- Table 1 based on the methods and accuracy.
performance measurements. Also, the ultrasonic scan is the
most extensively utilised approach for diagnosing geological
4. Use of computer-aided techniques for BC
illness, i.e. BC. The initial stage for recognising the anomaly of
breast cancer (malignant from benign) is removing the region
of interest (ROI). For this purpose, a new strategy for breast ROI The hypothesis of hormone reliance in breast cancer was
extraction was suggested to minimise false positive cases (FP) initially postulated based on the observation of the dis­
(Zeebaree et al. 2019). The efficacy of the suggested method ease’s aggressive nature in younger women. In 1906,
was contrasted with the current methods utilised to divide up Beatson played a pivotal role in initiating the period of
various types of images. In another work (Saber et al. 2021), an endocrine surgery, predating Jensen’s discovery of oestro­
innovative DL model based on the transfer-learning (TL) tech­ gen receptors in 1967 and the subsequent popularity of
nique was proposed to efficiently help automatically detect oophorectomy and adrenalectomy as methods of achieving
and diagnose the BC suspicious region based on 80–20 and castration (Horsley and Horsley 1962). Oestrogen receptor
cross-validation techniques. modulators, luteinising hormone-releasing agonists, and
Furthermore, Sharma et al (Sharma et al. 2018). compared aromatase inhibitors progressively superseded the utilisa­
three widely used machine learning algorithms and methodol­ tion of these more aggressive approaches. The preservation
ogies for breast cancer prediction: Random Forest (RF), kNN, of Halstead’s legacy was temporarily upheld by Margottini
and NB. The Wisconsin Diagnosis Breast Cancer data set was and Veronesi in Milan, who also removed internal mammary
employed as a training set to examine the performance of nodes. Furthermore, other individuals expanded the con­
several machine learning approaches in terms of essential char­ cept of ‘radicality’ by including the removal of supraclavi­
acteristics such as accuracy and precision. The outcomes were cular and mediastinal nodes. Nevertheless, during the late
extraordinarily comparable and might be applied to detection 19th and early 20th centuries, a progressive shift marked
and therapy. In 2016, Asri et al (Asri et al. 2016). used the the decline of the belief that larger surgeons were more
Wisconsin Breast Cancer (original) datasets to analyse the per­ skilled, as indicated by their ability to make larger incisions
formance of the various ML methods, including SVM, DT, NB, and execute more extensive surgery. Patey and Handley,
and k-NN. The primary goal was to evaluate the correctness of hailing from London, and Auchincloss Jr., based in

Table 1. A review of the related works based on the method and accuracy.
No Reference Method Accuracy
1 (Allugunti Convolutional Neural Network (CNN), SVM, RF, Convolutional Processing mammography images before the use of ML techniques
2022) Networks, SVM, RF increased the accuracy
2 (Michael KNN, SVM, RF, Xgboost, Lightgbm Lightgbm has the accuracy, precision, recall, and the F1 score of
et al. 99.86%, 100%, 99.6%, and 99.8%
3 (Siddiqui An Internet of Medical Things (IoMT) cloud-based model The accuracies required for detecting ductal carcinoma, lobular
et al. carcinoma, mucinous carcinoma, and papillary carcinoma were
2021) 99.69%, 99.32%, 98.96%, and 99.32%
4 (Ak 2020) LR, KNN, SVM, NV, and DT A logistic regression model with a classification accuracy of about
5 (Sha et al. You Only Look Once (YOLO) and RetinaNet The accuracy and precision of 79% and 91% for the detection
6 (Sha et al. Image noise reduction, optimal image segmentation based on the 96% Sensitivity, 93% Specificity, 85% PPV, 97% NPV, 92% accuracy
2020) convolutional neural network, a grasshopper optimization
algorithm, and optimized feature extraction and feature selection
based on the grasshopper optimization algorithm,
7 (Zheng et al. Deep Learning assisted Efficient Adaboost Algorithm (DLA-EABA) Accuracy, sensitivity, and specificity of 97%, 98%, and 96%
8 (Wang et al. Hybrid deep hybrid learning (CNN-GRU) Accuracy, precision, and sensitivity, specificity of 86%, 85.5%, 85.6%,
2022) 84.7%

New York, spearheaded a transformative movement aimed Inspired by pattern and computational learning theory, ML
at modifying the radical mastectomy procedure while examines the study and construction of algorithms that can
ensuring the preservation of the pectoralis major muscle learn and make predictions based on data (Shailaja et al. 2018;
(Thornes 1967). The rapid progression of medical radiation Wiens and Shenoy 2018; Siddique and Chow 2021). Such algo­
techniques for cancer cell eradication, alongside the devel­ rithms do not simply follow the program’s commands and
opment of novel kinds of chemotherapy that accomplish make predictions or decisions by modelling sample input
the same objective and induce medical castration or target data. ML is used in computing tasks where designing and
altered tumour receptors, has necessitated a re-evaluation programming explicit algorithms with proper performance is
of methodologies employed in cancer management. The difficult or impossible. Some applications include email filter­
growing understanding of the biological characteristics of ing, identification of Internet intruders or internal malware that
breast cancer and the limited efficacy of surgery as intends to breach information, optical text reader, ranking
a standalone treatment accompanied these findings. The learning, and machine vision. The need to automate decision-
introduction of mammography for the early diagnosis of making and decision-making processes has increased with the
tiny lesions has significantly enhanced the surgical manage­ expansion of information technology applications in various
ment of cancer (O H C D Panel 1979). fields. As one of the leading solutions to meet these needs,
It is projected that in the year 2022, an estimated 287,850 artificial intelligence uses methods based on machine learning.
incidents of invasive breast cancer and 51,400 instances of ductal ML is closely related to computational statistics and often over­
carcinoma in situ (DCIS) will be detected in women residing in laps with it. The focus of this branch is computer prediction,
the United States. Additionally, it is anticipated that 43,250 and it has a strong link with mathematical optimisation, which
women will experience mortality as a result of breast cancer. also brings methods, theories, and applications into the field.
The majority of invasive breast cancers, namely 83%, are Machine learning is sometimes combined with data mining,
detected in women who are 50 years old or above. and this subsection focuses on the exploratory analysis of
Additionally, a significant proportion, 91%, of breast cancer- data, known as unsupervised learning. ML can also be unsu­
related fatalities are seen within this age demographic. pervised and can be used to learn and recognise different
Furthermore, half of all breast cancer deaths are reported organisms’ initial forms of behaviour and then find meaningful
among women aged 70 years or above. The median age of anomalies. In data analysis, ML is a method for designing com­
diagnosis for female breast cancer is generally seen to be 62 plex algorithms and models used for prediction; this is known
years; however, it tends to be slightly lower for Hispanic (57 as predictive analytics in the industry. These analytical models
years), Asian/Pacific Islanders (API) (58 years), Black (60 years), allow researchers, data science researchers, engineers, and
and AIAN (61 years) women compared to White women (64 analysts to obtain reliable and repeatable decisions and results
years). This discrepancy can be attributed, at least in part, to and, by learning from the relationships and trends related to
variations in the age distribution of these respective populations. the past, reveal the hidden frosts (Swain et al. 2022).
The median age at which individuals succumb to breast cancer is Much attention has been devoted to the applications of ML
69 years on average. However, this age varies across different in detecting breast cancer, as shown in the literature. In 2020,
racial and ethnic groups, with White women experiencing Vaka et al. proposed a novel way of detecting breast cancer
a median age of 70 years, Hispanic women at 62 years, and API using ML algorithms (Vaka et al. 2020). To assess performance,
and Black women at 63 years (Denise Jozwik et al. 2023). As of the authors conducted an experimental analysis on a dataset.
1 January 2022, an estimated population of 4.1 million women in Compared to existing methods, the proposed method gener­
the United States were reported to have a documented medical ated precise and efficient results. Data mining methods are
history of breast cancer. Around 4% of these women are now crucial in predicting early-stage breast cancer. A strategy was
suffering from metastatic illness, with over half of them first also provided that increases the precision and efficiency of
being diagnosed with early-stage (I-III) malignancies (Gallicchio three different classifiers: DT, NB, and Sequential Minimal
et al. 2022). Optimization (SMO) (Mohammed et al. 2020). The classifiers
The breast comprises several primary constituents, includ­ were also validated and compared on two benchmark datasets:
ing: Lobules are integral components of the glandular system, Wisconsin Breast Cancer (WBC) and Breast Cancer dataset.
serving as glandular structures responsible for breast milk pro­ Because the chance of cases falling within the majority class
duction. Lobules are shown to be organised in clusters, was very high, the ML models were considerably prone to
together comprising a lobe. Ducts are tiny conduits that trans­ categorise novel findings to the majority class. This work
port breast milk from the lobules to the nipple (Ma et al. 2019). addressed such a difficulty. The authors employed the data
Because breast cancer is women’s second most significant level technique, which entailed resampling the data to offset
cause of death, accurate early identification can significantly the effect of class imbalance. 10-fold cross-validations were
reduce breast cancer mortality rates (Houssein et al. 2021). used for evaluation. Each classifier’s efficiency was measured
Radiologists can detect abnormalities more efficiently using regarding true positive, false positive, Roc curve, standard
computer-aided detection. Medical images provide informa­ deviation (Std), and accuracy (AC). Experiments demonstrated
tion that can be used to detect and diagnose various diseases that applying a resample filter improves the classifier’s perfor­
and abnormalities. Several modalities allow radiologists to mance, with SMO outperforming another in the WBC dataset
investigate the interior structure, which has sparked interest and J48 outperforming competitors in the Breast Cancer data­
in various research areas. Each of the modalities, as mentioned set. In other work, two of the most prominent ML approaches
earlier, is important in some medical domains. were employed to classify the Wisconsin Breast Cancer

(Original) dataset. Their classification performance was com­ and assessed using various measurements (for example, accu­
pared using accuracy, precision, recall, and ROC Area values racy, sensitivity, and specificity). It was revealed that ELM-based
(Bayrak et al. 2019). The SVM approach produced the greatest outcomes outperformed MLP-based results by more than 19%.
results with the best accuracy. On the Wisconsin Breast Cancer In a review of 2021 (Meenalochini and Ramkumar 2021), the
Diagnostic (WBCD) dataset, Bayrak et al. compared the perfor­ methods and processes proposed for cancer tumour classifica­
mance of five nonlinear machine learning algorithms: MLP, tion were examined. The performance of several classification
KNN, Classification and Regression Trees (CART), Gaussian techniques was compared. The authors stated that classifica­
Naive Bayes (NB), and SVM (Bayrak et al. 2019). The major tion accuracy can be enhanced using hybrid techniques.
goal was to assess each algorithm’s performance in categoris­ Notably, many databases like () can be used to assess the
ing data in terms of efficiency and effectiveness based on performance of the various ML techniques in detecting breast
classification test accuracy, precision, and recall. An ML and cancer. The features are derived from an image that has been
image processing-based evolutionary strategy was presented scanned, which represents a fine needle aspirate (FNA) of
to identify and detect breast cancer (Jasti et al. 2022). To aid in a breast lump. The properties of the cell nuclei seen in the
classifying and detecting skin diseases, this model integrated photograph are typically described in the sources of the con­
image preprocessing, feature extraction, feature selection, and sidered databases. In order to verify the performance of the
ML approaches. A geometric mean filter was utilised to improve used ML or DL techniques, the effectiveness of the proposed
the image’s quality. AlexNet was employed for feature extrac­ methods in the various databases has been neglected so far.
tion, and the relief algorithm was also considered to pick The breast dataset is a comprehensive collection of data
features. The model employed ML techniques such as least encompassing a significant portion of the Prostate, Lung,
square SVM, KNN, RF, and NB for disease categorisation and Colorectal, and Ovarian (PLCO) data about breast cancer inci­
detection. MIAS data collection was used in the experimental dence and death analysis (). In the case of several women, the
inquiry. Using image analysis, this proposed method was useful study records the occurrence of multiple instances of breast
for reliably recognising breast cancer disease. Due to the com­ cancer. However, the present document encompasses data
plicated molecular variety, triple-negative breast cancer (TNBC) about the initial breast cancer diagnosis within the experiment.
is challenging to diagnose and treat. To address these issues, The dataset has entries for approximately 78,000 women parti­
employing artificial intelligence to forecast the cellular uptake cipating in the PLCO experiment. The Breast Secondary dataset
of nanoparticles (NPs) against distinct cancer stages was sug­ comprises data about supplementary breast malignancies
gested (Alafeef et al. 2020). For the first time, the authors recorded throughout the experiment and gathered using the
showed that a ML method combined with distinctive cellular Breast Cancer Supplement form. The collection comprises indi­
uptake responses for particular cancer types might successfully vidual records for about 78,000 women participating in the
classify various cancer cell types. This method optimised nano­ PLCO experiment.
materials to achieve the best structure-internalisation response
for a specific particle (Alafeef et al. 2020).
5. The various ML techniques
Despite its high cost and numerous adverse effects, mam­
mography is frequently employed as the most frequent labora­ Medicine and health are very important for the continua­
tory approach for detecting breast cancer. ML prediction has tion of human life, and the applications of artificial intelli­
demonstrated promising outcomes as an alternate strategy. gence in medicine and health have increased in recent
Mojrian et al. provided a strategy for detecting breast cancer years (Chen et al. 2021). The research in the fields related
using an extreme learning machine (ELM) classification model to medicine, medicine, and services for people with dis­
linked with a radial basis function (RBF) kernel termed ELM-RBF, abilities indicates that artificial intelligence technology can
utilising the Wisconsin dataset (Mojrian et al. 2020). The pro­ create significant changes in fields such as disease diag­
posed model’s performance was then compared to a linear nosis, treatment methods, drug disorders, and medical
SVM model. The suggested model beat the linear-SVM model image processing (Jain and Chatterjee 2020). Among the
by having RMSE, R 2, and MAPE values of 0.1719, 0.9374, and other factors influencing the health of the human body, we
0.0539, respectively. Furthermore, both models’ accuracy, pre­ can mention exercise and healthy nutrition (McCoy et al.
cision, sensitivity, specificity, validity, true positive rate (TPR), 2020). Artificial intelligence, with the ability to analyse and
and false-negative rate (FNR) were investigated. The ELM-RBF process information quickly, as well as technologies such
model performed better for these criteria than the SVM model. as machine vision, the Internet of Things, and robotics, has
A thermogram-based breast cancer detection method was pro­ been able to provide software, platforms, and practical
posed in another research (AlFayez et al. 2020). This method gadgets to improve the quality of human life in the field
was divided into four stages: (1) Image preprocessing with of sports and nutrition (Ngiam and Khor 2019). Healthcare
homomorphic filtering, top-hat transform, and adaptive histo­ companies have the potential to enhance healthcare effi­
gram equalisation, (2) ROI segmentation with binary masking ciency and achieve cost savings via the utilisation of ML
and K-mean clustering, (3) feature extraction with signature technology. One potential use of ML in the healthcare
boundary, and (4) classification with Extreme Learning domain is the advancement of algorithms to enhance the
Machine (ELM) and Multilayer Perceptron (MLP) classifiers. The management of patient information and the scheduling of
proposed method was tested using the general dataset DMR- appointments. This ML form can mitigate the inefficiencies
IR. Various experiment situations (for example, integration of associated with repetitive tasks in the healthcare system,
geometrical and textural feature extraction) were constructed optimising time and resource use.

Indeed, ML is a potent subfield within the realm of AI. determined the subsequent actions in the monitoring layer
However, it is essential to note that AI encompasses a broader (Ngiam and Khor 2019). The experimental findings showed
range of approaches and methodologies, each with distinct that the suggested model efficiently and accurately detected
merits and practical uses. The selection of an appropriate AI the students’ condition. The SVM attained an excellent accuracy
approach is contingent upon several factors, including the of 99.1% after analysing the given model, which was a good
particular situation at hand, the characteristics of the data, the outcome for our goal. The results outperformed decision trees,
requirement for interpretability, and the intended objectives. random forests, and multilayer perceptron neural networks.
Because it serves as the basis for both learning and general­ Table 2 reviews the various ML models used for detecting BC.
isation, accessing an appropriate dataset is fundamental to As can be seen from Table 2, the performance of SVM has
accomplishing machine learning models. However, there may been acceptable in the previous research as it reached about
be obstacles in the way of fully realising the promise of ML 97% accuracy. The comparison between the studies of Naji
approaches due to problems with data quality and availability, et al. and Khourdifi & Bahaj revealed that the classification
prejudice, and ethical implications. In ML, it is vital to have accuracy of SVM is better. Also, many other studies presented
efficient data collecting, curation, and preprocessing in place remarkable findings in this regard. For instance, a novel tech­
in order to make the most of the benefits related to dataset nique termed BDR-CNN-GCN was proposed by Zhang et al. in
needs and avoid the drawbacks. The primary objective of CAD 2021 (Zhang et al. 2021) which consisted of a graph convolu­
is to enhance the precision and uniformity of diagnostic ima­ tional network (GCN) and CNN to better identify malignant
ging via the use of image processing, computer vision, and lesions in breast mammograms. The proposed method out­
machine learning methodologies, which provide a remarkable performed the other state-of-art BC detection methods and
performance. five suggested NN models. In another work (Zhang et al.
The BC datasets of the University of California Irvine (UCI) are 2018), the breast dataset was selected as the open-access
a benchmark online resource utilised extensively in the litera­ mini MIAS dataset to balance the dataset, cost-sensitive learn­
ture. Nithya and Santhi obtained 97.8% efficiency using a multi- ing. The training set’s size was increased using data augmenta­
boost ensemble technique (Frank and Asuncion 2010; Weli tion and a CNN with nine additional layers. The authors
2020). Another study proposed an IoT-based student health­ contrasted the rectified linear unit (ReLU), leaky ReLU, and
care monitoring model using innovative technologies to con­ parametric ReLU activation functions. Six pooling methods
tinuously assess student vital signs and identify biological and were also contrasted, including rank-based average pooling,
behavioural abnormalities (Souri et al. 2020). In this concept, rank-based weighted pooling, rank-based stochastic pooling,
crucial data was collected using IoT devices, and data analysis and average, max, and stochastic pooling. The findings empha­
was performed using ML methods to predict potential dangers sised the superiority of the DL method over the traditional
of student physiological and behavioural abnormalities. artificial intelligence methods in terms of detection accuracy.
Student health information was saved in the cloud layer. Notably, ensemble algorithms combine the results of different
Accordingly, the data analysis tasks were conducted to deter­ base models to improve the overall predictive performance in
mine the students’ health state. The results of this layer a single model. By combining the results of multiple models,

Table 2. A brief review of the methods, purposes, and accuracy.

No References Method Purpose Accuracy
1 (Naji et al. 2021) SVM, Random Forest, Logistic Regression, Decision Anticipating and diagnosing BC based on The superiority of SVM with 97.2%
tree and KNN ML models accuracy
2 (Khourdifi and RF, NB, SVM, and K-NN Classification and prediction of BC SVM with 97.9%
Bahaj 2018)
3 (Ngiam and SVM, CART, NB and kNN Detection and prediction KNN, NB, and CART have better accuracy
Khor 2019)
4 (Ghosh et al. Wisconsin Breast Cancer (Diagnostic) Dataset. Long Detection and anticipation 99% accuracy
0000) Short Term Memory (LSTM) and Gated Recurrent
Unit (GRU)
5 (Fatima et al. Comprehensive analysis and review Examining the various ML models SVM had remarkable accuracy
6 (Shanbehzadeh Naïve Bayes (NB), Bayesian network (BNeT), random Assessing BC based on the evaluation of With AUC values of 0.799 and 0.798, the
et al. 2022) forest (RF), MLP, SVM, C4.5, eXtreme Gradient the previous ML techniques RF algorithm had the best
Boosting (XGBoost), decision tree and two performance both before and after
ensemble algorithms, including Confidence executing FS.
weighted voting and Voting
7 (Nomani et al. Particle swarm optimized wavelet neural network BC prediction Specificity of 98.8%, precision of 98.6%,
2022) (PSOWNN) accuracy of 95.2%
8 (Vrdoljak et al. Neoadjuvant systemic therapy (NST) based on Detection of axillary lymph node status AUC: 0.762 [0.726, 0.795]
2023) XGBoost
9 (Dehdar et al. XGBoost, RF, NNs, and LR Anticipation of factors for delayed BC 99% accuracy
2023) diagnosis in Iranian women
10 (Mirza et al. The diagnostic model’s accuracy was improved by Reducing the number of genes in a large With accurate diagnosis and prognosis
2023) cutting-edge statistical techniques and cross- cohort of transcriptomics data in order at a lower cost, the found gene
validation with different ML techniques, which to build a diagnostic model for cancer signature biomarkers enhanced
also forecasted a fresh diagnostic nine-gene classification. healthcare management.

the advantages and disadvantages of the various models 6. Conclusion

emerge, correct predictions are reinforced, and incorrect pre­
ML teaches computers to understand patterns from data in
dictions are cancelled out. Most ensemble algorithms are ‘black
a specific domain by developing mathematical models. In its
boxes’ because the underlying base models are randomly gen­
most basic form, machine learning is a two-step process. First,
erated and are not led by exact predictions (Ho ; Ng and Soo
a model is constructed utilising sample data as input, referred
to as the ‘training set’, plus the model receives the correct
In addition, the performance of the ML techniques have been
outputs. Models can be created using a variety of ML algo­
always various in the updated publications. The RF approach
rithms, such as Logistic Regression (LR). Following training, the
demonstrated superior performance both prior to and during
model is evaluated using previously unknown data, referred to
feature selection (FS), achieving area under the curve (AUC)
as the ‘test set’. The model is meant to foresee the output of the
values of 0.799 and 0.798, respectively. Moreover, the utilisation
test set during the testing stage, which occurs with a certain
of the Confidence Weighted Voting technique enhanced the
level of accuracy. A model can be efficient if it performs well in
classifier’s performance, leading to the attainment of the optimal
training and testing and vice versa. The current systematic
outcome, characterised by an 80% (Shanbehzadeh et al. 2022).
review reviewed the various ML models used for detecting BC
This study employed various machine learning classification
based on accuracy, precision, and recall. It was found that
techniques, including NB, LR, SVM, KNN, DT, and ensemble tech­
despite such remarkable interest, many gaps and shortcomings
niques such as RF, Adaboost, and XGBoost. These techniques
still need to be tackled as soon as possible. The previous works
were applied to a breast cancer dataset, and their performance
mainly benefited from SVM, KNN, and DT in detecting BC. As
was assessed using various performance measures. The research
stated in the literature, SVM performs better for classification as
findings obtained by Nemade and Fegade (Nemade and Fegade
it can also be improved using the other solutions. In the future,
2023) indicated that the decision tree and XGBoost classifier
more comparative studies are required to make better compar­
exhibited the best accuracy, reaching 97%, compared to other
isons. Different deep learning-based techniques, such as CNN,
models. Additionally, the XGBoost classifier achieved the highest
DNN, RNN, DBN, and AE-based approaches, have recently been
AUC value of 0.999. The research comprehensively analysed ML
developed to diagnose breast cancer. The most prominent
techniques employed for detecting breast cancer. The analysis
deep-learning method, CNN, has been used in multiple studies
unveiled a widespread use of SVM, KNN, and DT in the extant
to detect breast cancer. Combining various risk factors in breast
scholarly literature. Moreover, recent studies have noticed
cancer prediction modelling could aid early illness detection
a notable increase in the utilisation of deep learning DL meth­
and provide the appropriate therapy. The advent of DL meth­
odologies, including CNN and RNN.
odologies, particularly CNN and RNN designs, offers
ML models are predominantly influenced by the data they
a promising opportunity for the diagnosis of breast cancer.
are trained on and the patterns that may be extracted. These
These methodologies have considerable potential in effectively
individuals lack moral discernment, empathy, and ethical delib­
managing unstructured data, namely medical photos, and indi­
eration. In disciplines such as law, justice, social work, and
cate enhanced performance in the extraction of features and
philosophical decision-making, where intricate ethical delibera­
classification. The assessment underscored the critical necessity
tions, contextual factors, and moral tenets are important, exclu­
of tackling many issues, such as enhancing the interpretability
sive dependence on ML may result in choices that might raise
of machine learning models, managing unbalanced datasets,
ethical concerns or lack human empathy. For example, ML
and guaranteeing the resilience and applicability of the created
models alone may not sufficiently handle identifying the suita­
algorithms. By incorporating approaches of explainable artifi­
ble penalty in legal situations, making choices in child welfare
cial intelligence AI and undertaking thorough validation on
services, or addressing intricate ethical challenges that need
a wide range of datasets, it is possible to alleviate these issues.
consideration of individual circumstances and moral reasoning.
Regarding the future works, the challenges of image proces­
These scenarios frequently need human comprehension, com­
sing techniques used in the medical field require more consid­
passion, and the capacity to decipher intricate, context-
eration. The integration of multimodal data in breast cancer
dependent elements, which machine learning models may
diagnosis is still limited since several research primarily con­
not fully encompass. In instances of this nature, although ML
centrate on analysing specific types of data such as mammo­
can aid in examining data or providing insights, the ultimate
decision-making procedure may necessitate the inclusion of grams, genetic markers, and histology. There is a need for
human judgement and ethical deliberation to guarantee equi­ comprehensive models that adeptly include several data mod­
table, impartial, and ethically sound results. Recognising these alities, including imaging, genomics, proteomics, and clinical
inherent constraints, a concerted endeavour is underway to data, aiming to augment accuracy and resilience in the
include ethical frameworks in advancing AI and ML. The pri­ domains of diagnosis and therapy prediction. Despite the
mary objective of ethical AI frameworks is to establish encouraging outcomes observed in controlled research inves­
a harmonious alignment between artificial intelligence technol­ tigations, the application of machine learning models in clinical
ogy and human values, as well as ethical standards. This align­ practice continues to face constraints in terms of validation. The
ment facilitates decision-making processes that are research should thoroughly validate these models inside
characterised by responsibility and thoughtfulness. authentic clinical environments to evaluate their effectiveness,
Nevertheless, the complete incorporation of moral thinking practicality, and influence on patient outcomes. Also, the pri­
and ethical judgement into AI systems continues to pose mary focus of this study is on early detection and precision
a significant obstacle. medicine, emphasising the implementation of early detection

