Breast Cancer Prediction
Breast Cancer Prediction
Breast Cancer Prediction
Abstract
This thesis examines machine learning techniques for breast cancer risk prediction to improve
early detection methods. This study compares machine learning algorithms that predict breast
cancer using clinical and demographic data. The findings affect clinical practice and advance our
understanding of how machine learning might detect breast cancer. This study examines the
potential of implementing these models into normal healthcare practices, focusing on early
diagnosis and proactive intervention. This research suggests machine learning may predict breast
cancer risk by combining computational and clinical methodologies. Finally, this study shows
machine learning's power. The results support personalized medicine's goal of early and accurate
at-risk identification to improve patient outcomes. This thesis examines machine learning
techniques for breast cancer risk prediction to improve early detection methods. This study
compares machine learning algorithms that predict breast cancer using clinical and demographic
data. The statistics show that each algorithm has pros and cons. Random Forest provides strong
accuracy and precision, SVM has good predictive abilities, Ada Boost has good precision and
recall, Gradient Boosting performs well overall, and Logistic Regression works well across
several categories. Decision Trees have limited potential, Random Forest has strong accuracy
and precision, and Random Forest predicts well. The findings affect clinical practice and
advance our understanding of how machine learning might detect breast cancer. This study
examines the potential of implementing these models into normal healthcare practices, focusing
on early diagnosis and proactive intervention. The thesis also suggests researching more
complicated ensemble techniques, combining data from a variety of omics sources, simplifying
models, and evaluating the ethics of utilizing machine learning models in medicine. This
research suggests machine learning may predict breast cancer risk by combining computational
and clinical methodologies. The results support personalized medicine attempts to improve
patient outcomes by accurately and early identifying breast cancer risk factors.
Chapter 1: Introduction
Background
Breast cancer is the most frequent kind of cancer in women and the leading cause of death
among cancer patients worldwide, therefore it poses a considerable danger to public health on a
global scale. Breast cancer will have claimed the lives of more than 685,000 people worldwide
by 2020, and 2.3 million additional cases are expected to be diagnosed [1]. Traditional methods
of assessing breast cancer risk focus mainly on demographic, family, and reproductive
characteristics as key components. Even though these strategies have shown positive results [2].
More complex datasets may be analyzed and learned from using machine learning techniques
such as neural networks, logistic regression, random forests, and support vector machines. This
allows for the development of more accurate prediction models as well as a more thorough
understanding of the factors that raise the risk of breast cancer [3].
It is feasible that it will have an impact on clinical practice, public health initiatives, and,
ultimately, the lives of those who are at risk of getting breast cancer or who have already been
afflicted by the illness [4]..
The creation of large and varied datasets including genetic profiles, medical histories, and
imaging data gives a once-in-a-lifetime chance to build risk prediction models that are more
accurate and personalised than ever before [5]. This is a once-in-a-lifetime chance.
Exploring the full potential of machine learning in the context of breast cancer risk prediction is
strongly influenced by the field's ongoing advances from their standpoint [6].
This project attempts to employ machine learning to provide risk evaluations that are more exact,
comprehensive, and personalised. Furthermore, it aims to close the gaps in the methodologies
presently used to anticipate breast cancer risk [7.This covers breast cancer epidemiology, risk
factors, and prior efforts to apply machine learning for risk prediction [8].
In this research, we will look at how various feature selection procedures affect model
correctness and readability [9]. Take advantage of a broad range of machine learning approaches
for constructing breast cancer risk prediction models. This category includes approaches such as
logical regression, neural networks, support vector machines, random forests, and virtual reality.
To illustrate improvements and new insights, the performance of the proposed models should be
compared to the performance of current risk assessment models. The machine learning models'
interpretability and explainability should be enhanced [10]. Methods to increase the accuracy of
machine learning models in predicting the risk of breast cancer must be investigated. It is
beneficial to both patients and physicians to better understand the components that go into
establishing risk assessments for individuals [11].
The goal of this project is to make a significant contribution to the development of techniques for
predicting the risk of breast cancer if these goals are met [12].
The major goal of this project is to build and test machine learning models for predicting the
likelihood of breast cancer occurrence. The primary goal of the project is to improve the
accuracy and breadth of the risk assessment methodologies that are presently in use [13]. Among
the different data sources utilized in the research, clinical records, genetic information, and
medical imaging data were some of the information sources employed. However, we will not
examine every form of data; rather, we will focus on data that is easily accessible and relevant to
the process of determining the risk of breast cancer.
A variety of machine learning approaches, including logistic regression, neural networks, support
vector machines, and random forests, are being employed and enhanced for this inquiry. It is
crucial to note that this article does not cover all machine learning algorithms. To identify
important risk indicators, the research analyses numerous approaches of feature selection.
Furthermore, attempts are being undertaken to increase machine learning models' explainability
and interpretability [14].
The scope of this research includes ethical concerns about data security, consent, and any biases
imposed by machine learning algorithms. Our goal is to call attention to the new insights, limits,
and improvements that have been brought about by the proposed approach [15]. These include
the ability to modify the offered model, the exploration of other data sources, and the expansion
of the use of machine learning in cancer therapy. The research presupposes that well-maintained
and well-organized datasets are freely available for analysis [16].
This work might lead to major breast cancer research and treatment advancements [17]. The
following key components demonstrate the findings' significance. Combining a variety of data
sources with powerful machine learning algorithms may provide personalized breast cancer
therapy and prevention [18]. This initiative uses clinical, genomic, and imaging data to study
several risk factors for breast cancer, a complex illness [19].
These suggestions may help breast cancer researchers and others employ machine learning [20].
This research may improve early diagnosis and risk assessment, affecting public health.. This
study expands machine learning research, which has medicinal applications [21].
Chapter 2: Literature Review
Breast Cancer Epidemiology
Breast cancer is a serious obstacle to global public health since it impacts millions of people's
lives and has a convoluted epidemiology that must be unraveled. The goal of this part is to look
at the present state of breast cancer incidence, prevalence, and mortality rates to give a
framework for further research into risk factors [22].. An analysis of the variables influencing
survival rates, as well as the prevalence of breast cancer in various populations [23]..
In this section, we will look at the demographic trends of breast cancer incidence, such as age-
specific incidence rates, age at diagnosis, and any discrepancies that may exist between racial
and ethnic groups [24]. An inquiry into how genetics, cultural factors, and healthcare architecture
all contribute to the genesis of these discrepancies is underway [25]. The principal emphasis of
this section, which explores the evolution of breast cancer epidemiology over time, is on the
effect that socioeconomic and lifestyle variables have had on the incidence rates of breast cancer
throughout time [26].
Identify how the mortality rates linked with breast cancer have developed over time, as well as
how the advent of new diagnostic and treatment techniques has influenced overall survival rates
[27]. We will look at the impact that public health initiatives. Analyzing new breakthroughs in
breast cancer research, such as the increased prevalence of subtypes, and establishing what
barriers stand in the way of effective care and prevention efforts may be seen as a problem [28].
This contains a data synthesis on the epidemiology of breast cancer. The goal of this section is to
offer a foundation for the upcoming risk factor analysis [29]. To create effective risk prediction
models and lead focused preventative measures.
The goal of this section is to provide a detailed overview of the many risk factors connected with
breast cancer by examining the most current research [30]. The age at which a woman reaches
menarche, the age at which she gives birth for the first time, her parity, and whether she receives
hormone replacement therapy all contribute to her risk of breast cancer. [31].
An inquiry of the possible involvement that lifestyle variables, such as food intake, physical
activity, alcohol use, and pollution in our surroundings, may have in the development of breast
cancer [32]. The Gender and Age section of this research examines trends in breast cancer risk
based on gender, with a focus on how this risk changes with age. The effects of gender-related
risk variables on men and women are being studied [33]. Reproductive and Menstrual History,"
numerous reproductive parameters, such as breastfeeding, age at menopause, menstrual history,
and other reproductive variables, are explored in connection to the existence of breast cancer risk
[34].
The author discusses the interplay of several risk variables. A thorough examination of the
intricate relationship that exists between the cumulative influence of a wide variety of risk
factors and the likelihood of acquiring breast cancer [35]. The goal of this literature review is to
provide the groundwork for future research on machine learning models that may incorporate the
multiple elements that increase a woman's risk of breast cancer into a cohesive framework for
risk prediction.
A thorough examination of the Gail model, which is used to predict the risk of invasive breast
cancer in women by considering several factors such as the age at which they had their first live
child, their age, their family history, the age at which they reached menarche, and others [36].
This section examines the Tyrer-Cuzick model, a computer algorithm that assesses the lifetime
risk of breast cancer based on several factors such as age, family history, hormones, and
reproduction. Examination of the Claus model, which is used to forecast the risk of breast cancer
by factoring in age, family history, and the number of relatives who have been diagnosed with
the illness [37]. An examination of the BCSC risk model, which is a method for estimating the
likelihood of developing invasive breast cancer by considering factors such as breast density,
age, family history, and information obtained from prior breast biopsies [38].
A primer on the International Breast Cancer Intervention Study (IBIS) models, which are used to
forecast the likelihood of breast cancer and aid in the selection of preventative measures [39].
These IBIS-I model have several limitations, including the fact that they are reliant on a small
number of factors and may be inaccurate for subpopulations [40].
The purpose of this section is to look at current efforts to enhance the accuracy of classic risk
assessment models by including imaging data. In this research, we will look at the most current
and relevant data on conventional risk assessment methods, as well as continuing initiatives to
improve the accuracy and usefulness of these models [41].
Investigate the prediction of breast cancer risk using logistic regression models, with an
emphasis on interpretability and usability [42]. A review of research that employs decision trees
and random forests, with an emphasis on how ensemble learning aids in the development of
credible risk prediction models. A review of studies on the use of support vector machines
(SVMs) to predict the probability of breast cancer, taking into consideration how effectively
these models handle complex, high-dimensional data [43].
Breast cancer risk prediction using neural networks, with a focus on their capacity to recognize
complicated patterns in varied datasets [44]. A review of studies that have employed genetic
algorithms and recursive feature elimination, among other feature selection approaches, to
discover which breast cancer risk factors are most relevant. A review of works that looked at
ways to make breast cancer risk prediction models more generalizable and durable by using
ensemble learning methods such as boosting and bagging [45].
Research evaluation that compares several machine learning algorithms to discover how well
they perform and where they fall short [46]. Summarizes recent developments and innovative
strategies in machine learning for breast cancer risk prediction, taking into consideration
developing trends and the inclusion of new technologies [47]. This section evaluates the existing
research on machine learning methodologies to give insights into the ever-changing landscape of
breast cancer risk prediction algorithms [48]. In the next chapters, we will construct a full and
complex model by analyzing the strengths and limitations of previous strategies [48].
There are 2.5% gaps and fundamental limitations that must be addressed [49]. To have a
complete picture of the risk of breast cancer, it is vital to fill in these gaps, which this part does
by taking a deep look at them [50]. This paper provides a review of the research on breast cancer
risk factors and the possible advantages that these variables may have when combined with other
data sets [51].
It is vital to recognize research that may not pay enough attention to how various feature
selection procedures affect modeling performance [52]. In this article, we discuss the need of
doing extensive assessments to discover the most effective breast cancer risk prediction models
[53].
It has been noticed that there is a paucity of research on the ethical quandaries that may arise
from the use of machine learning algorithms to predict the potential of breast cancer [54]. The
importance of ethical frameworks and standards in the healthcare business for the proper and
equitable use of predictive models is discussed here. In recognition of research that may
overlook the relevance of machine learning model interpretability [55]. This is a critique of
research that may not have undertaken a thorough analysis into whether the models generated
can be applied for other types of populations [56].
An inquiry of the requirement for models that can adapt to changing risk variables throughout
the course of their life [57]. In this article, we look at the role of patient-centered approaches in
increasing model adoption and utility. This refers to the process of identifying studies that are
either invalidated or have not been implemented in the real world [57]. There is a paucity of
effort in research that strives to predict the risk of breast cancer to understand and alleviate health
inequities. This is known as the 2.5.10.0 health inequities not being adequately examined [58].
The study framework is impacted by the gaps discovered to conduct a more thorough and
meaningful examination [59. Examining the studies discussed in this article may help to better
understand the complicated web of variables that impact the risk of breast cancer [60].
Examines the Health Belief Model, a theoretical framework that studies how people view their
own vulnerability to breast cancer and the severity of the illness, as well as the advantages and
disadvantages of taking preventative measures [61]. An examination of TPB with a focus on how
people's beliefs, subjective standards, and sense of agency influence the plans and actions they
do to lower their risk of breast cancer [62].
The Ecological Model of Health Behavior is explained in detail. An analysis of the Precision
Prevention paradigm, which highlights the need to employ tailored medicines based on
individuals' risk profiles [63]. The aims of customized breast cancer risk prediction are
discussed, as well as how they relate to this theoretical framework [64]. Examining the
Integrative Model of Behavior Prediction in terms of understanding breast cancer-prevention
behaviors. This is done to acquire a better knowledge of the habits that lower the chance of
developing breast cancer [65].
We will examine the many diverse components of breast cancer risk using the Biopsychosocial
Model, which emphasizes the interplay of individual and societal variables in determining health
outcomes [66].. We will go over how this technique considers the amount of time and
complexity that is necessary in risk prediction [67]. A discussion on the need for cultural
sensitivity in breast cancer research that seeks to predict the occurrence of the disease [68].
A review of the research on risk communication ideas, with a focus on how convincing
arguments could change people's perceptions of the risk of breast cancer and their motivation to
take preventive actions [69]. The goal of this section is to provide a conceptual foundation for
understanding the variables that impact the risk of breast cancer in women by situating the study
within a range of theoretical frameworks [70].
Chapter 3: Methodology
As breast cancer remains one of the major causes of mortality worldwide, it is critical to
diagnose cases at an early stage and accurately identify people who are at risk. Machine learning
algorithms have shown the ability to reliably forecast the risk of breast cancer by examining a
broad variety of patient-related data and biomarkers. The goal of this thesis is to filter through a
variety of machine learning algorithms to find the one that delivers the highest accurate
prediction of breast cancer risk. The major goal is to assess and compare the accuracy of seven
different machine learning algorithms in predicting the risk of breast cancer based on significant
demographic and clinical data, including Gradient Boosting, Adaboost, SVM, LR, DT, KNN,
and RF.
A large breast cancer dataset with all essential information, such as genetic markers, age, family
history, mammography results, and so on, should be chosen. The preparation methods that assure
the quality of the data include, among other things, category encoding, numerical feature
standardization, and missing value management. Techniques such as feature importance from
tree-based models, correlation analysis, or domain knowledge may be used to determine which
characteristics are most essential for training the model. This may be performed by using feature
significance scores or Recursive Feature Elimination (RFE) to sort and choose features.
Using proper libraries and frameworks to carry out each different machine learning technique
that has been selected. Instruction and optimization of each model's hyperparameters using
approaches like as grid search, random search, or Bayesian optimization to get the best possible
performance.
Cross-validation methods are applied to avoid overfitting and to guarantee robustness. A variety
of assessment measures, including F1-score, ROC-AUC, confusion matrices, recall, accuracy,
and precision, are used to assess the performance of the models. Use statistical techniques to
explore the presence of model performance variances (for example, paired t-tests or Wilcoxon
signed-rank tests).
This entails evaluating and comparing the performance metrics of several models to determine
which algorithm is the most successful in predicting the likelihood of breast cancer. ROC curves
and precision-recall curves are two examples of data visualization techniques that may be used to
display and explain comparative findings. The process of interpreting the results considering the
relative qualities of the various models. We're looking into the causes for the disparities in
performance and weighing the implications for therapeutic applications. It is intended to address
the research's shortcomings as well as any possible biases. Evaluate each machine learning
algorithm to assess whether it can forecast the risk of breast cancer, and then summarize your
findings. The process of offering recommendations for the next steps to take in terms of research
to improve prediction accuracy or explore adding alternative data sources.
Data Collection
Using machine learning to predict breast cancer risk requires a big dataset that is representative
of the community and includes all relevant qualities and information. Breast cancer research
requires a good dataset. Research databases, healthcare organizations, and biomedical archives
may provide trustworthy datasets. Breast cancer databases should include several characteristics.
Medical history, genetic markers, demographics, and clinical results should be provided. True-
information websites: A lot of breast cancer data is available on reliable sources like the UCI
Machine Learning Repository, SEER, and TCGA. Maintaining ethical compliance ensures data
protection and ethical standards are met. Following ethics committee or institutional review
board requirements for handling patient data is crucial.
Preprocessed datasets should remove errors, outliers, and missing values. Data must be
preprocessed before being input into a machine learning model. Preprocessing includes
imputation, normalization, and categorical variable encoding. Using feature engineering, you
may collect useful data or develop new features to enhance prediction. Domain knowledge,
variable transformation, and feature determination are part of feature engineering. The dataset is
split into three sets: training, validation, and test. The training set trains the model, the validation
set tunes hyperparameters, and the test set evaluates performance. Address class imbalance
problems, especially if the dataset has a lot of positive or negative breast cancer instances. Class
parity may be achieved via oversampling, undersampling, or fabricating data. It's crucial to
document the dataset's information, including its creation, variables, transformations, and
modifications throughout data collection and preparation. Documentation is needed for
repeatability and transparency. To assure data accuracy, consistency, and trustworthiness,
rigorous quality testing is needed. Check the data for correctness, compare it to preset criteria,
and double-check the inputs.
To safeguard this information, adopt proper data processing and save sensitive patient
information securely. Data governance-compliant safe storage is needed for the dataset.
Researchers may get a solid dataset by following these steps. Machine learning algorithms must
use this dataset to predict breast cancer risk. Feature selection is crucial to ChatGPT's machine
learning breast cancer risk prediction. Finding the dataset's most relevant and informative
features may improve model performance while reducing processing complexity. EDA, which
involves a thorough dataset exploration, may help explain breast cancer risk, feature
distributions, correlations, and connections. Visualisation methods like correlation matrices, box
plots, and histograms may help understand data. Rank features by target variable value using
statistical measures. This is done via filters. Chi-square tests, mutual information, and correlation
coefficients are typical metrics. It is crucial to delete elements that do not improve aim accuracy
or correlation.
To choose subsets of features, combine machine learning model performance with iterative
methods like forward/backward selection or recursive feature elimination. Wrapper techniques
are common. Select the most effective feature subset after model training. The intrinsic feature
selection capabilities of certain machine learning algorithms may automatically uncover and
prioritize essential information during model training. Lasso regularization in Logistic
Regression and feature importance in Random Forest are embedded methods. PCA or t-
Distributed Stochastic Neighbor Embedding (t-SNE) may reduce dataset dimensionality while
keeping key structure or variance. Reduced-dimensional input quality may be used by machine
learning algorithms.
To incorporate domain knowledge, consult medical professionals or other domain experts to
verify your features are suitable. Expert knowledge may add location-specific or therapeutic
components that statistical analysis cannot identify. Machine learning methods may use L1 or L2
regularisation to choose implicit features during model training. This is done by penalizing
irrelevant features and boosting sparsity. Make sure your features can generalize and enhance
model prediction without overfitting to the training set. Look at how well the selected feature
subset performs on validation data.
Reconsider feature selections as you build models repeatedly. Use model performance data to
enhance feature subsets by adding or eliminating fewer valuable features. Please record the traits
selected, their importance, and why they were included or removed. Recording and reporting this
data is crucial. The transparent and repeatable feature set should be shown in the research results.
Researchers may increase model accuracy and interpretability by adopting systematic feature
selection processes. They can uncover numerous crucial markers that help predict breast cancer
risk when they do this.
Model Development
Several algorithms must be built, trained, fine-tuned, and assessed to build a machine learning
breast cancer prediction model. Use the preprocessed dataset from data collection to prepare
data. Divide the dataset into training, validation, and test sets. Do not train, validate, or analyze
your model till then. The algorithm's complexity, flexibility, and performance in similar
applications should be considered while choosing a breast cancer risk prediction model. Gradient
Boosting, Adaboost, SVMs, Logistic Regression, Decision Trees, K-Nearest Neighbors, and
Random Forest are computational methods.
The model must be performed using Scikit-learn, TensorFlow, or PyTorch to execute each
algorithm. Start the models and adjust their default parameters. Many methods may be used to
fine-tune model hyperparameters. Grid search, random search, and Bayesian optimization are
examples. To enhance model performance, adjust learning rates, regularization strengths, kernel
types, tree depths, and other factors. After applying the optimum hyperparameters to the training
dataset, the models are trained. K-fold cross-validation is essential to assess model
generalizability and prevent overfitting.
Check the training models' performance using the validation set. Recall, accuracy, precision, F1-
score, ROC-AUC, and confusion matrices may assess prediction and generalizability. Compare
each algorithm's performance metrics to determine which ones best predict breast cancer risk.
ROC curves and precision-recall curves may help explain and analyze comparison results. If
possible, use ensemble methods like model stacking or model averaging to combine predictions
from many models. This may enhance estimation accuracy.
The last step is to choose the model or models that can accurately predict breast cancer risk using
validation set performance metrics. Using a label-free test dataset, the model must be tested and
verified to verify real-world applicability. Perform tests to ensure the model's performance
remains unchanged after receiving fresh data. Analyzing the selected model(s) may reveal the
relevance of traits, the decision-making process's boundaries, and breast cancer risk prediction
factors. This is crucial to visualizing the model and understanding its predictions.
Evaluation Matrix
Machine learning models may be assessed using numerous criteria to evaluate breast cancer risk.
These measurements may reveal a lot about the models' classification and forecasting skills. The
accuracy value shows how well forecasts hold up against the dataset's total occurrences. The
number of accurate guesses is precision. All Predictions Together Forecast accuracy equals total
forecasts. An algorithm's "accuracy" is its ability to discern actual from fake positive outcomes
from the total number of positive occurrences it predicts. To verify optimistic estimates,
measurements are taken.
Number of true positives equals accuracy. Add true positives to actual and fake false positives to
assess precision. The sensitivity meter, or recall measure, evaluates the model's ability to detect
all positive occurrences. It calculates the ratio of successfully predicted positive events to
positive events. The number of true positives is recall. Mix of excellent and bad results. The sum
of true positives and erroneous negatives is recall. The F1-Score balances recall and accuracy.
This is a decent balance between their measures. Skewed datasets shine.
The ROC-AUC compares sensitivity, true positive rate, and specificity. It achieves this by
considering their ratio. This method evaluates the model's positive and negative case detection. A
confusion matrix lists true, negative, false positive, and total predictions. The confusion matrix
displays this information to assess model performance. It is crucial to computing recall,
accuracy, and precision. A model's specificity is its ability to detect actual negative events vs all
genuine negative events. Real negative rate is another name for specificity. The Matthews
Correlation Coefficient (MCC) allows fair evaluation by considering all four confusion matrix
values. This applies regardless of class size. True positives, true negatives, false positives, and
the total of all are the Maximum Common Error Correction (MCC). Summing all positive and
negative results ("proper false positives minus true negatives" ratio).
The Maximum Confidence Level (MCC) includes all conceivable outcomes, including right and
wrong findings, negative and positive results, including false negatives and negative finds. Using
genuine positives and real negatives, false positives and negatives are possible. These criteria
must be considered while assessing breast cancer risk prediction systems. In an imbalanced data
set, greater precision does not always mean a more accurate model. The breast cancer risk
prediction job's characteristics and goals should guide assessment measure selection.
Chapter 4: Results and Discussion
The major goal of this project is to construct a prediction model that may be used to assess the
chance of acquiring breast cancer using machine learning methods, namely Gradient Boosting
algorithms. The goal of the research was to improve the accuracy and consistency of breast
cancer risk assessments to promote early identification and preventative intervention strategies.
This was done to address the problem of breast cancer. The findings of the performance
measurements on the Gradient Boosting model are described in this thesis. These metrics include
recall, accuracy, precision, and F1 score.
The results revealed that the Gradient Boosting model showed promise in predicting the risk of
breast cancer, with an overall accuracy of 82%. The findings show that the program is capable of
correctly identifying occurrences and discriminating between people posing varying degrees of
danger. The model's accuracy was assessed to be 87%, which is defined as the percentage of
correctly anticipated cases that were accurate in contrast to the total number of expected positive
occurrences. This model seems to be able to decrease the number of false positives due to its
high degree of accuracy, which is required for an effective risk prediction system.
Furthermore, the Gradient Boosting model was able to achieve a recall rate of 93,000 percent.
This research had a high recall rate, indicating that it has a significant potential to identify
persons who are at risk of getting breast cancer. This recall rate represents the model's ability to
identify all true positive cases. This is especially important to consider in the realm of healthcare
due to the potentially disastrous repercussions of failing to acknowledge positive conditions. The
F1 score, which was a compromise between recall and accuracy, resulted in a score of 90%. This
statistic gives a full assessment of the model's performance by accounting for both false positives
and false negatives. It is supported by the derived F1 score, which indicates that there is a
balanced relationship between recall and accuracy, and that the model can appropriately predict
the likelihood of breast cancer.
The results of this study show that Gradient Boosting is an effective strategy for predicting breast
cancer risk, with high levels of accuracy, recall, precision, and F1 score values. It is feasible that
the model produced will be employed in clinical settings, which would be advantageous for
breast cancer identification and early intervention. The goal of future study might be to improve
the model's generalizability and usefulness. This might be achieved by making the model more
robust, adding new attributes, and doing extensive validation across several datasets.
The goal of this thesis is to investigate the application of machine learning methods, namely the
Ada Boost algorithm, to predict the risk of breast cancer.
One of the key goals of this study is to develop a reliable tool for early diagnosis and risk
assessment to enhance healthcare outcomes.
These results show all the performance criteria linked with the Ada Boost model, including as
accuracy, precision, recall, and F1 score. The study's findings revealed that the Ada Boost model
performed very well in forecasting the likelihood of breast cancer, with a total accuracy of 70%.
Even though the accuracy of this model is lower than that of others, it proves that it is still
capable of making correct predictions.
When it comes to recognizing instances of positive comments, the model was found to have an
accuracy rate of 86%, also known as precision. This means that the model is correct 86% of the
time when predicting a person's risk for breast cancer. As a result, the number of false positives
is minimal, indicating that the model is accurate.
The Ada Boost model may likewise get an F1 score and a recall of 86%. Both the recall metric
and the F1 score give a fair assessment that considers both the model's accuracy and recall. The
recall meter measures how well the model recognizes all real positive events. The model clearly
exhibits excellent levels of performance in terms of both sensitivity and accuracy, as shown by
the fact that its recall and F1 score are similar occurrences. Because it achieves a reasonable
degree of accuracy and has exceptionally high values for recall, precision, and F1 score, the
study concludes that Ada Boost is a successful algorithm for predicting the likelihood of breast
cancer. Even though its accuracy is not as good as that of other models, it seems that the model
achieves an acceptable balance between recall and precision. This model has the potential to be a
major addition to the arsenal of breast cancer risk assessment tools due to its ability to offer a
suitable middle ground between appropriately detecting positive cases and preventing false
positives. Future research may investigate ensemble techniques, feature engineering, and bigger
datasets to increase the model's functionality and practicability in healthcare settings.
The goal of this thesis is to look at the usage of machine learning methods in predicting breast
cancer risk, with a focus on the SVM ensemble approach. The researchers want to contribute to
the development of models that are accurate and reliable for early diagnosis and risk assessment
via this study. The data presented in this thesis stress the SVM model's performance metrics.
Accuracy, precision, recall, and F1 score are among these criteria. The total accuracy of the
SVM ensemble model was 77%, which is excellent in terms of its capacity to effectively
anticipate the risk of breast cancer. This illustrates that the algorithm is capable of properly
identifying occurrences and differentiating between people with differing degrees of danger. The
SVM model was shown to have an accuracy of 85%, which is the proportion of positive
scenarios correctly predicted out of the total number of positive circumstances expected. With
such a high degree of accuracy, the model has been effective in lowering the number of false
positives, which is an important component of a trustworthy risk prediction system.
In addition, the SVM model was discovered to have a recall rate of 93%, indicating that it can
detect all true positive events. The high recall score shows that there is a strong potential for
identifying patients at risk of breast cancer. This is critical in healthcare settings because the lack
of good examples may have catastrophic consequences. The F1 score, which considers both
accuracy and recall, was judged to be 89%. This metric, which considers both false positives and
false negatives, provides a comprehensive evaluation of the model's performance. The obtained
F1 score seems to reflect a well-balanced trade-off between accuracy and recall, indicating that
the model is dependable in predicting the risk of breast cancer.
The findings of this research suggest that the SVM ensemble approach is an effective way for
predicting breast cancer risk. It has excellent accuracy as well as exceptional precision, recall,
and F1 score values. Finally, the findings indicate that the SVM ensemble technique is a viable
methodology. The model has the potential to be employed in clinical practice, which would help
to improve breast cancer diagnosis and facilitate therapeutic treatments at the proper time. Future
research might focus on optimizing the ensemble's composition, studying novel properties, and
completing rigorous validation on a range of datasets to increase generalizability and usability in
the real world.
The technique k-Nearest Neighbours (KNN) serves as the focus of this thesis, which studies the
area of machine learning with the goal of forecasting the probability of breast cancer. The
fundamental goal of the study is to give aid in the creation of reliable models for risk assessment
and early identification. The findings of this study shed light on the performance metrics
provided by the KNN model. The F1 score, recall, accuracy, and precision are among these
measurements. The KNN algorithm attained a total accuracy of 70%, demonstrating that it works
extremely well in predicting the risk of breast cancer, as shown by the data. It is obvious from
this that the programme is capable of distinguishing between people with varying degrees of
danger by using instance classification. The model was discovered to have a precision of 74%,
indicating that it was effective in correctly projecting positive outcomes. To put this into context,
this means that the model correctly predicts the risk of breast cancer 74% of the time.
Furthermore, the KNN model was found to have a recall of 75%, indicating that it was able to
successfully detect all instances of true positives. When the model accurately recalls 75% of
positive cases, it indicates that it has a moderate degree of capacity to identify people who are at
risk of getting breast cancer. The F1 exam yielded a score of 74%, which is a statistic that
considers both accuracy and recall skills. This score gives a full assessment of the model's
performance since it considers both true positives and false positives. The F1 score obtained
indicates that the model was able to correctly anticipate the risk of breast cancer, indicating that
there is an acceptable balance between recall and accuracy.
The research found that the KNN algorithm can predict breast cancer risk with acceptable
accuracy, recall, precision, and F1 score values. Although other models may be more accurate,
the KNN model has a good recall-precision ratio. Future research may involve parameter
correction, feature engineering, and other data sources. These approaches might increase the
model's performance and make it more helpful in healthcare. This thesis examines how machine
learning may detect breast cancer using the Random Forest (RF) algorithm. The initiative aims to
build reliable early diagnostic and risk assessment algorithms. The study emphasizes signal
processing model F1 score, recall, accuracy, and precision.
Using its findings, the Random Forest algorithm predicted breast cancer risk with 77% accuracy.
Using event categorization, the technique accurately identifies those at risk. The model predicted
positive utterances with 87% accuracy, like its precision. With an 87% success rate, the system
seems to have few false positives when evaluating breast cancer risk. The RF model had an 86%
recall rate, meaning it could recognize every true positive. The algorithm can detect breast cancer
risk factors well, according to the recall score. This is supported by 86% of positive cases being
correctly identified.
The F1 score was 86%, which measures accuracy and recall. Since it includes true and false
positives, this score fully evaluates the model. The generated F1 score shows that recall and
accuracy are balanced and that the model can accurately forecast breast cancer risk. This
research shows that the Random Forest algorithm predicts breast cancer with good accuracy,
recall, and F1 score values. Due to its ability to enhance breast cancer detection and therapy, the
model might be used clinically. Hyperparameter tuning, feature significance analysis, and many
dataset types should be studied to enhance the model's generalizability and practicability.
This thesis investigates using Decision Trees (DT) and other machine learning approaches to
accurately forecast breast cancer risk. The work aims to help develop credible risk assessment
and early detection methods. This data illuminates all four DT model performance metrics.
Criteria include accuracy, precision, recall, and F1 score. Decision Trees predicted breast cancer
risk with 64% accuracy. This shows that instance categorization allows the software to
discriminate between dangerous people. The model predicts favorable outcomes 76% accurately.
The algorithm predicts breast cancer risk with 76% accuracy. The DT model had a 70% recall
rate, suggesting it accurately identified all true positive cases. The model's 70% success rate
suggests it may detect breast cancer risk factors. An F1 score of 73% was adequate after
considering accuracy and recall. Since it includes true and false positives, this score fully
evaluates the model. The F1 score shows that the model successfully predicted breast cancer risk,
demonstrating a good recall-accuracy ratio.
This research found that the Decision Trees algorithm can predict breast cancer risk with
acceptable accuracy, recall, precision, and F1 score values. Despite its low accuracy, the DT
model has high recall and precision. Ensemble techniques, parameter tweaks, and additional
features may enhance the model's performance and make it more relevant to real-world
healthcare. In this thesis, we examine the Logistic Regression (LR) technique, one of numerous
machine learning methods, for breast cancer risk prediction. The initiative aims to build reliable
early diagnostic and risk assessment algorithms. The statistics presented reveal LR model
performance measures including recall, accuracy, precision, and F1 score.
Logistic Regression accurately predicted breast cancer risk. Method accuracy was 80%. Using
event categorization, the technique accurately identifies those at risk. The model accurately
predicted favorable conditions 85% of the time. This suggests that the model predicts breast
cancer risk 85 percent of the time due to its low false positive rate. Worse, the LR model had a
91% recall, detecting every true positive. The recall value indicates that 91% of positive
instances were successfully recognized, suggesting that the model can identify breast cancer risk
factors.
The F1 test scored 88%, including accuracy and recall. Since it includes true and false positives,
this score fully evaluates the model. The generated F1 score shows that recall and accuracy are
balanced and that the model can accurately forecast breast cancer risk. This research suggests
that Logistic Regression may predict breast cancer risk with good accuracy, recall, and F1 score
values. Due to its ability to enhance breast cancer detection and therapy, the model might be used
clinically. Researchers may study regularization methods, feature selection, and a variety of
datasets to make the model more realistic in the real world.
Chapter 5: Conclusion and Future Work
Conclusion
Finally, this research examined several algorithms and methods to better understand breast
cancer early detection and risk assessment. It also uses machine learning to anticipate danger. To
help create reliable treatment models was a major goal. This thesis has shown good outcomes
from many machine learning methods. Each breast cancer prediction algorithm—Gradient
Boosting, Ada Boost, Random Forest, Decision Trees, and Logistic Regression—had pros and
cons. These algorithms predicted breast cancer. The Gradient Boosting model predicted risks
with 82% accuracy, 87% precision, 93% recall, and 90% F1 score. Ada Boost also had high
recall and precision, with a 70% accuracy rate, suggesting it might detect breast cancer. The
SVM ensemble approach performed well with an F1 score of 89%, accuracy of 77%, precision
of 85%, and recall of 93%. This suggests that classifier synergy may boost prediction abilities.
Random Forest's 77% accuracy, 87% precision, 86% recall, and 86% F1 score show that it can
balance the two metrics required for a risk prediction system.
The Decision Trees algorithm shows potential in breast cancer risk prediction. The system has
64% accuracy, 76% precision, 70% recall, and 73% F1 score. Although its accuracy is lower
than other models, its recall-precision trade-off is fair, indicating it may be beneficial in certain
contexts. Logistic Regression was one of the hardest opponents, scoring 88% F1 and achieving
80% accuracy, 85% precision, 91% recall, and 91% recall. Its reliability in predicting breast
cancer is shown by its consistent performance across several parameters. Using machine learning
to predict breast cancer risk has improved, according to one research. By using these models,
healthcare providers may take preventive measures and identify possible issues early. Modelling,
hyperparameter optimization, and ensemble research may improve future research forecasts and
make them more applicable to real-world situations. A machine learning-healthcare alliance
might revolutionize breast cancer risk prediction. This alliance might improve patient care and
outcomes.
Future Work
This thesis has shown how machine learning can predict breast cancer risk, however there are
still many undiscovered areas where study might improve treatment and advance science. The
following issues are potential study topics. Innovative model fusion and ensemble training
methods may improve prediction accuracy. Examine the different ways several models may
improve estimates. A more generalizable single model may be created via stacking, mixing, and
weighted ensembles. This study examines how feature engineering and selection effect model
efficiency. Identifying and incorporating new relevant features and improving existing ones may
improve models' ability to distinguish across data types. Principal component analysis and
recursive feature removal may improve feature selection.
Comparing genomes, transcriptomics, and proteomics data with earlier multi-omics studies may
help identify breast cancer risk factors. Combining numerous data modalities may also help
models uncover risk-related subtle patterns and relationships. Improve the interpretability and
explainability of breast cancer prediction machine learning models. Tools that can understand
complex models like Gradient Boosting and Random Forest may make them more useful in
therapy. Doctors would benefit from clear decision-making insights. Clinical Practice and Real-
World Validation: Conduct rigorous real-world validation tests using a range of large-scale
datasets. To ensure generalizability, models must be tested on multiple demographics and
groups. Machine learning models must be tested in clinical settings with medical professionals.
Longitudinal study and dynamic risk prediction may examine breast cancer risk over time.
Incorporate dynamic risk prediction models into your research. This method might help tailor
screening and preventive programs to specific patients. Concerns about ethics and society
Machine learning algorithms for breast cancer prediction raise ethical and societal issues. To
guarantee ethical implementation and avoid worsening healthcare inequalities, transparency,
accountability, and justice must be addressed. To readily incorporate machine learning models
into healthcare operations, public health professionals, epidemiologists, and doctors must
collaborate. User-friendly interfaces and decision help tools should facilitate these models'
clinical usage. If these lines of study are pursued, early detection approaches, patient outcomes,
precision and personalised medicine, and breast cancer risk prediction may improve.
Chapter 6: References
1. Behravan, H., Hartikainen, J. M., Tengström, M., Kosma, V. M., & Mannermaa, A. (2020).
Predicting breast cancer risk using interacting genetic and demographic factors and machine
learning. Scientific reports, 10(1), 11044.
2. Gao, Y., Li, S., Jin, Y., Zhou, L., Sun, S., Xu, X., ... & Wang, Y. (2022). An Assessment of the
Predictive Performance of Current Machine Learning–Based Breast Cancer Risk Prediction
Models: Systematic Review. JMIR Public Health and Surveillance, 8(12), e35750.
3. Ahmed, M. R., Ali, M. A., Roy, J., Ahmed, S., & Ahmed, N. (2020, December). Breast Cancer
Risk Prediction based on Six Machine Learning Algorithms. In 2020 IEEE Asia-Pacific
Conference on Computer Science and Data Engineering (CSDE) (pp. 1-5). IEEE.
4. Kim, G., & Bahl, M. (2021). Assessing risk of breast cancer: a review of risk prediction
models. Journal of breast imaging, 3(2), 144-155.
5. Islam, M. M., Haque, M. R., Iqbal, H., Hasan, M. M., Hasan, M., & Kabir, M. N. (2020). Breast
cancer prediction: a comparative study using machine learning techniques. SN Computer
Science, 1, 1-14.
6. El Haji, H., Souadka, A., Patel, B. N., Sbihi, N., Ramasamy, G., Patel, B. K., ... & Banerjee, I.
(2023). Evolution of Breast Cancer Recurrence Risk Prediction: A Systematic Review of
Statistical and Machine Learning–Based Models. JCO Clinical Cancer Informatics, 7, e2300049.
7. Rabiei, R., Ayyoubzadeh, S. M., Sohrabei, S., Esmaeili, M., & Atashi, A. (2022). Prediction of
breast cancer using machine learning approaches. Journal of Biomedical Physics &
Engineering, 12(3), 297.
8. Hou, C., Zhong, X., He, P., Xu, B., Diao, S., Yi, F., ... & Li, J. (2020). Predicting breast cancer in
Chinese women using machine learning techniques: algorithm development. JMIR medical
informatics, 8(6), e17364.
9. Macaulay, B. O., Aribisala, B. S., Akande, S. A., Akinnuwesi, B. A., & Olabanjo, O. A. (2021).
Breast cancer risk prediction in African women using random forest classifier. Cancer Treatment
and Research Communications, 28, 100396.
10. López, N. C., García-Ordás, M. T., Vitelli-Storelli, F., Fernández-Navarro, P., Palazuelos, C., &
Alaiz-Rodríguez, R. (2021). Evaluation of feature selection techniques for breast cancer risk
prediction. International Journal of Environmental Research and Public Health, 18(20), 10670.
11. Dembrower, K., Liu, Y., Azizpour, H., Eklund, M., Smith, K., Lindholm, P., & Strand, F. (2020).
Comparison of a deep learning risk score and standard mammographic density score for breast
cancer risk prediction. Radiology, 294(2), 265-272.
12. Ming, C., Viassolo, V., Probst-Hensch, N., Dinov, I. D., Chappuis, P. O., & Katapodi, M. C.
(2020). Machine learning-based lifetime breast cancer risk reclassification compared with the
BOADICEA model: impact on screening recommendations. British journal of cancer, 123(5), 860-
867.
13. Naji, M. A., El Filali, S., Aarika, K., Benlahmar, E. H., Abdelouhahid, R. A., & Debauche, O.
(2021). Machine learning algorithms for breast cancer prediction and diagnosis. Procedia
Computer Science, 191, 487-492.
14. Kabiraj, S., Raihan, M., Alvi, N., Afrin, M., Akter, L., Sohagi, S. A., & Podder, E. (2020, July).
Breast cancer risk prediction using XGBoost and random forest algorithm. In 2020 11th
international conference on computing, communication and networking technologies
(ICCCNT) (pp. 1-4). IEEE.
15. Gupta, P., & Garg, S. (2020). Breast cancer prediction using varying parameters of machine
learning models. Procedia Computer Science, 171, 593-601.
16. Khozama, S., & Mayya, A. M. (2021). Study the Effect of the Risk Factors in the Estimation of the
Breast Cancer Risk Score Using Machine Learning. Asian Pacific Journal of Cancer Prevention:
APJCP, 22(11), 3543.
17. Pawar, S., Sapate, S., & Sharma, K. (2020, April). Machine learning approach towards
mammographic breast density measurement for breast cancer risk prediction: An overview.
In Proceedings of the 3rd International Conference on Advances in Science & Technology
(ICAST).
18. Rawal, R. (2020). Breast cancer prediction using machine learning. Journal of Emerging
Technologies and Innovative Research (JETIR), 13(24), 7.
19. Tao, W., Lu, M., Zhou, X., Montemezzi, S., Bai, G., Yue, Y., ... & Lu, G. (2021). Machine learning
based on multi-parametric MRI to predict risk of breast cancer. Frontiers in Oncology, 11,
570747.
20. Telsang, V. A., & Hegde, K. (2020, December). Breast cancer prediction analysis using machine
learning algorithms. In 2020 International Conference on Communication, Computing and
Industry 4.0 (C2I4) (pp. 1-5). IEEE.
21. Su, Y. R., Buist, D. S., Lee, J. M., Ichikawa, L., Miglioretti, D. L., Bowles, E. J. A., ... & Hubbard,
R. A. (2023). Performance of statistical and machine learning risk prediction models for
surveillance benefits and failures in breast cancer survivors. Cancer Epidemiology, Biomarkers &
Prevention, 32(4), 561-571.
22. Gupta, A., Kaushik, D., Garg, M., & Verma, A. (2020, October). Machine learning model for
breast cancer prediction. In 2020 fourth international conference on I-SMAC (IoT in social,
mobile, analytics and cloud)(I-SMAC) (pp. 472-477). IEEE.
23. Akinnuwesi, B. A., Macaulay, B. O., & Aribisala, B. S. (2020). Breast cancer risk assessment and
early diagnosis using Principal Component Analysis and support vector machine
techniques. Informatics in medicine unlocked, 21, 100459.
24. Iparraguirre-Villanueva, O., Epifanía-Huerta, A., Torres-Ceclén, C., Ruiz-Alvarado, J., &
Cabanillas-Carbonel, M. (2023). Breast cancer prediction using machine learning models.
25. Kakileti, S. T., Madhu, H. J., Manjunath, G., Wee, L., Dekker, A., & Sampangi, S. (2020).
Personalized risk prediction for breast cancer pre-screening using artificial intelligence and
thermal radiomics. Artificial Intelligence in Medicine, 105, 101854.
26. Kumar, B. S., Daniya, T., & Ajayan, J. (2020). Breast cancer prediction using machine learning
algorithms. International Journal of Advanced Science and Technology, 29(3), 7819-7828.
27. Arefan, D., Mohamed, A. A., Berg, W. A., Zuley, M. L., Sumkin, J. H., & Wu, S. (2020). Deep
learning modeling using normal mammograms for predicting breast cancer risk. Medical
physics, 47(1), 110-118.
28. Fatima, N., Liu, L., Hong, S., & Ahmed, H. (2020). Prediction of breast cancer, comparative
review of machine learning techniques, and their analysis. IEEE Access, 8, 150360-150376.
29. Shamrat, F. J. M., Raihan, M. A., Rahman, A. S., Mahmud, I., & Akter, R. (2020). An analysis on
breast disease prediction using machine learning approaches. International Journal of Scientific
& Technology Research, 9(02), 2450-2455.
30. Prastyo, P. H., Paramartha, I. G. Y., Pakpahan, M. S. M., & Ardiyanto, I. (2020, April). Predicting
breast cancer: A comparative analysis of machine learning algorithms. In Proceeding
International Conference on Science and Engineering (Vol. 3, pp. 455-459).
31. Yang, Y., Tao, R., Shu, X., Cai, Q., Wen, W., Gu, K., ... & Zheng, W. (2022). Incorporating
polygenic risk scores and nongenetic risk factors for breast cancer risk prediction among Asian
women. JAMA network open, 5(3), e2149030-e2149030.
32. Kim, H., Lim, J., Kim, H. G., Lim, Y., Seo, B. K., & Bae, M. S. (2023). Deep Learning Analysis of
Mammography for Breast Cancer Risk Prediction in Asian Women. Diagnostics, 13(13), 2247.
33. Mendes, J., & Matela, N. (2021). Breast cancer risk assessment: a review on mammography-
based approaches. Journal of Imaging, 7(6), 98.
34. Tiwari, M., Bharuka, R., Shah, P., & Lokare, R. (2020). Breast cancer prediction using deep
learning and machine learning techniques. Available at SSRN 3558786.
35. Srivenkatesh, M. (2020). Prediction of breast cancer disease using machine learning
algorithms. International Journal of Innovative Technology and Exploring Engineering
(IJITEE), 9(4), 2868-2878.
36. Boeri, C., Chiappa, C., Galli, F., De Berardinis, V., Bardelli, L., Carcano, G., & Rovera, F. (2020).
Machine Learning techniques in breast cancer prognosis prediction: A primary
evaluation. Cancer medicine, 9(9), 3234-3243.
37. Humayun, M., Khalil, M. I., Almuayqil, S. N., & Jhanjhi, N. Z. (2023). Framework for detecting
breast cancer risk presence using deep learning. Electronics, 12(2), 403.
38. Jaiswal, V., Saurabh, P., Lilhore, U. K., Pathak, M., Simaiya, S., & Dalal, S. (2023). A breast
cancer risk predication and classification model with ensemble learning and big data
fusion. Decision Analytics Journal, 8, 100298.
39. Alzu’bi, A., Najadat, H., Doulat, W., Al-Shari, O., & Zhou, L. (2021). Predicting the recurrence of
breast cancer using machine learning algorithms. Multimedia Tools and Applications, 80, 13787-
13800.
40. Lehman, C. D., Mercaldo, S., Lamb, L. R., King, T. A., Ellisen, L. W., Specht, M., & Tamimi, R. M.
(2022). Deep learning vs traditional breast cancer risk models to support risk-based
mammography screening. Journal Of The National Cancer Institute, 114(10), 1355-1363.
41. Pawar, S. D., Sharma, K. K., & Sapate, S. G. (2021). Advances in machine learning and deep
learning approaches for mammographic breast density measurement for breast cancer risk
prediction: an overview. Design of Intelligent Applications Using Machine Learning and Deep
Learning Techniques, 125-143.
42. Monirujjaman Khan, M., Islam, S., Sarkar, S., Ayaz, F. I., Kabir, M. M., Tazin, T., ... & Almalki, F.
A. (2022). Machine learning based comparative analysis for breast cancer prediction. Journal of
Healthcare Engineering, 2022.
43. Saranya, G., & Pravin, A. (2020). A comprehensive study on disease risk predictions in machine
learning. International Journal of Electrical and Computer Engineering, 10(4), 4217.
44. Dadsetan, S., Arefan, D., Berg, W. A., Zuley, M. L., Sumkin, J. H., & Wu, S. (2022). Deep
learning of longitudinal mammogram examinations for breast cancer risk prediction. Pattern
recognition, 132, 108919.
45. Nemade, V., & Fegade, V. (2023). Machine Learning Techniques for Breast Cancer
Prediction. Procedia Computer Science, 218, 1314-1320.
46. Amiri Souri, E., Chenoweth, A., Cheung, A., Karagiannis, S. N., & Tsoka, S. (2021). Cancer
Grade Model: a multi-gene machine learning-based risk classification for improving prognosis in
breast cancer. British Journal of Cancer, 125(5), 748-758.
47. Wu, J., & Hicks, C. (2021). Breast cancer type classification using machine learning. Journal of
personalized medicine, 11(2), 61.
48. Afrash, M. R., Bayani, A., Shanbehzadeh, M., Bahadori, M., & Kazemi-Arpanahi, H. (2022).
Developing the breast cancer risk prediction system using hybrid machine learning
algorithms. Journal of Education and Health Promotion, 11.
49. Ray, A., Chen, M., & Gelogo, Y. (2020). Performance Comparison of Different Machine Learning
Algorithms for Risk Prediction and Diagnosis of Breast Cancer. In Smart Technologies in Data
Science and Communication: Proceedings of SMART-DSC 2019 (pp. 71-76). Springer
Singapore.
50. Maghsoudi, O. H., Gastounioti, A., Scott, C., Pantalone, L., Wu, F. F., Cohen, E. A., ... & Kontos,
D. (2021). Deep-LIBRA: An artificial-intelligence method for robust quantification of breast density
with independent validation in breast cancer risk assessment. Medical image analysis, 73,
102138.
51. Jain, S., & Kumar, P. (2020). Prediction of breast cancer using machine learning. Recent
Advances in Computer Science and Communications (Formerly: Recent Patents on Computer
Science), 13(5), 901-908.
52. Romanov, S., Howell, S., Harkness, E., Bydder, M., Evans, D. G., Squires, S., ... & Astley, S.
(2023). Artificial intelligence for image-based breast cancer risk prediction using
attention. Tomography, 9(6), 2103-2115.
53. Rane, N., Sunny, J., Kanade, R., & Devi, S. (2020). Breast cancer classification and prediction
using machine learning. International Journal of Engineering Research and Technology, 9(2),
576-580.
54. Xiao, J., Mo, M., Wang, Z., Zhou, C., Shen, J., Yuan, J., ... & Zheng, Y. (2022). The application
and comparison of machine learning models for the prediction of breast cancer prognosis:
retrospective cohort study. JMIR medical informatics, 10(2), e33440.
55. Khatun, T., Utsho, M. M. R., Islam, M. A., Zohura, M. F., Hossen, M. S., Rimi, R. A., & Anni, S. J.
(2021, September). Performance Analysis of Breast Cancer: A Machine Learning Approach.
In 2021 Third International Conference on Inventive Research in Computing Applications
(ICIRCA) (pp. 1426-1434). IEEE.
56. Alfian, G., Syafrudin, M., Fahrurrozi, I., Fitriyani, N. L., Atmaji, F. T. D., Widodo, T., ... & Rhee, J.
(2022). Predicting breast cancer from risk factors using SVM and extra-trees-based feature
selection method. Computers, 11(9), 136.
57. Salod, Z., & Singh, Y. (2020). A five-year (2015 to 2019) analysis of studies focused on breast
cancer prediction using machine learning: A systematic review and bibliometric analysis. Journal
of Public Health Research, 9(1), jphr-2020.
58. Okagbue, H. I., Adamu, P. I., Oguntunde, P. E., Obasi, E. C., & Odetunmibi, O. A. (2021).
Machine learning prediction of breast cancer survival using age, sex, length of stay, mode of
diagnosis and location of cancer. Health and Technology, 11, 887-893.
59. Battineni, G., Chintalapudi, N., & Amenta, F. (2020). Performance analysis of different machine
learning algorithms in breast cancer predictions. EAI Endorsed Transactions on Pervasive Health
and Technology, 6(23).
60. Yala, A., Mikhael, P. G., Strand, F., Lin, G., Smith, K., Wan, Y. L., ... & Barzilay, R. (2021).
Toward robust mammography-based models for breast cancer risk. Science Translational
Medicine, 13(578), eaba4373.
61. Gupta, S. R. (2022). Prediction time of breast cancer tumor recurrence using Machine
Learning. Cancer Treatment and Research Communications, 32, 100602.
62. Akter, L., Raihan, M., Raihan, M. M. S., Ghosh, M., & Alvi, N. (2022). Breast cancer risk
prediction using different clustering techniques. In International Conference on Innovative
Computing and Communications: Proceedings of ICICC 2021, Volume 2 (pp. 191-203). Springer
Singapore.
63. Dalal, S., Onyema, E. M., Kumar, P., Maryann, D. C., Roselyn, A. O., & Obichili, M. I. (2023). A
hybrid machine learning model for timely prediction of breast cancer. International Journal of
Modeling, Simulation, and Scientific Computing, 14(04), 2341023.
64. Li, H., Robinson, K., Lan, L., Baughan, N., Chan, C. W., Embury, M., ... & Giger, M. L. (2023).
Temporal Machine Learning Analysis of Prior Mammograms for Breast Cancer Risk
Prediction. Cancers, 15(7), 2141.
65. Alaa, A. M., Gurdasani, D., Harris, A. L., Rashbass, J., & van der Schaar, M. (2021). Machine
learning to guide the use of adjuvant therapies for breast cancer. Nature Machine
Intelligence, 3(8), 716-726.
66. Moturi, S., Tirumala Rao, S. N., & Vemuru, S. (2021). Risk Prediction-Based Breast Cancer
Diagnosis Using Personal Health Records and Machine Learning Models. In Machine Intelligence
and Soft Computing: Proceedings of ICMISC 2020 (pp. 445-460). Springer Singapore.
67. Li, J., Zhou, Z., Dong, J., Fu, Y., Li, Y., Luan, Z., & Peng, X. (2021). Predicting breast cancer 5-
year survival using machine learning: A systematic review. PloS one, 16(4), e0250370.
68. Guleria, K., Sharma, A., Lilhore, U. K., & Prasad, D. (2020). Breast cancer prediction and
classification using supervised learning techniques. Journal of Computational and Theoretical
Nanoscience, 17(6), 2519-2522.
69. Yang, X., Eriksson, M., Czene, K., Lee, A., Leslie, G., Lush, M., ... & Antoniou, A. C. (2022).
Prospective validation of the BOADICEA multifactorial breast cancer risk prediction model in a
large prospective cohort study. Journal of medical genetics, 59(12), 1196-1205.
70. Yavuz, Ö. Ç., Calp, M. H., & Erkengel, H. C. (2023). Prediction of breast cancer using machine
learning algorithms on different datasets. Ingeniería Solidaria, 19(1), 1-32.