0% found this document useful (0 votes)

13 views

Breast Cancer Prediction Model Assignment

This document discusses a project focused on improving breast cancer detection and prevention using machine learning techniques. It utilizes a dataset of 569 clinical variables to train various models, with Random Forest achieving the highest accuracy of 95%. The study emphasizes the importance of early diagnosis and the potential of machine learning to enhance patient outcomes in breast cancer treatment.

Uploaded by

Saroj Neupane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Breast Cancer Prediction Model Assignment

Uploaded by

Saroj Neupane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 37

CET313 || Artificial Intelligen

7/5/2024
CET313 || Artificial Intelligence

Breast Cancer Detection and Prevention Using Machine Learning

Saroj Neupane
Student ID: 239756985
BSc (Hons) Computers System Engineering
Internation School of Management and Technology (ISMT), Kathmandu, Nepal
University of Sunderland, UK

1
CET313 || Artificial Intelligence

ABSTRACT

Breast cancer, the most frequent cancer in women, is a global health issue commonly
identified late when therapy is less successful. This effort uses a well-known dataset
comprising 569 clinical variables, including tumor size, texture, and cell nucleus
characteristics, to improve breast cancer diagnosis and prediction using machine learning.
After data preprocessing, such as missing values and feature scaling, feature importance and
correlation analysis identify relevant features for more accurate predictions. Logistic
Regression, Naive Bayes, Support Vector Machine, Random Forest, K-Nearest Neighbors
(KNN), Extreme Gradient Boosting, and Neural Networks are trained and assessed on
accuracy, precision, recall, and F1-score.

Machine learning is shown to predict breast cancer in the study. Random Forest had the
highest accuracy (95%), precision, and recall in this investigation, making it the most
dependable model. These findings suggest machine learning could enhance breast cancer
diagnosis and prognosis. The project provides a framework for machine learning breast
cancer diagnosis and treatment research.

Keywords: breast cancer; machine learning; prognosis & diagnosis

2
CET313 || Artificial Intelligence

Table of Contents

3
CET313 || Artificial Intelligence

Introduction
Breast cancer is still one of the leading cancers on a worldwide basis and is equated to a
major health concern because of the many types and levels of the disease. This is controlled
by factors such as genetics, hormone fluctuations, and other practicable decisions made in
one’s lifestyle. It is very important, especially regarding early stages, since the prognosis in
these cases is enhanced dramatically, and leaves a bigger chance at successful treatment and
increased patient survival.

This report strives to design a machine learning model that can be used for the prediction of
breast cancer, and especially that of early-stage cancers, so that positive results for patients
can be enhanced. The study aims to establish correlating factors related with survival of
patients and participants analyzing demography of patients, medical history and
characteristics of tumor to establish patterns that would determine occurrence of breast cancer
at stages that are most effective for intercession.

The research will aim at the factors like age, size of the tumor, the nodes that are impacted
and the histology. The components of the project include the collection and pre-processing of
the data, the choice of predictors and the building and assessment of the model. Exploratory
data analysis will be done with data visualization software and further the training and
validation of the machine learning models will be done using cloud-based algorithms.

The algorithms to be employed in the current study include the following; Logistic
Regression, Random Forest, Support Vector Machine, Gradient Boosting. These methods are
going to be assessed using such performance indicators as accuracy, precision, recall, and F1-
score in order to reveal the most suitable model. This is in an effort to improve diagnosis
efficiency of breast cancer through better data analytics which will help the patients in the
long run.

Therefore, the purpose of this report to analyse and find out how we can build a machine
learning model that will predict breast cancer in order to enhance the early-stage diagnoses
and treatment. Therefore, utilizing patient’s demographics data, medical history, and other
characteristics of breast cancer tumors the study aims to find patterns that would better
predict the probability of breast cancer occurrence at the stage when it is possible to treat and
eventually cure the disease. Hence, this project demonstrates how machine learning can be

4
CET313 || Artificial Intelligence

applied to increase efficiency of diagnosis of breast cancer and how advanced data analytics
play a role in improving patient care.

Link to E-portfolio

I consider it a challenge as well as a benefit to have worked on breast cancer prediction by

integrating machine learning. The main healthcare problem being addressed in this project is
provided below This project offered a definite framework on how to conduct exceptional
research in advanced techniques in machine learning. Some activities I participated in the
process of development includes data pre-processing, feature selection and model assessment.
These tasks provided the chance to design solutions to technical issues and to develop richer
understanding of the concept of predictive analysis.

The project included numerous datasets, different techniques of machine learning, and
adjustments of the models to increase forecasting precision. This not only enhanced my
know-how of computer science but also expanded my learnings on how machine learning can
be applied in solving such clinical issues such as breast cancer detection. This gives me the
motivation to present the advancement and impacts of this project in my e-Portfolio. You can
explore my e-Portfolio by clicking on the following link:

E-portfolio Link

Literature Review
This paper also shows that breast cancer prediction and analysis have evolved significantly
with the help of machine learning (ML) and deep learning (DL). They increase certainty in
diagnosing diseases as well as probability in risk assessments and early signs necessary for
increasing survival rates in patients.

Apart from the model selection process, some of the major prerequisites to finalize the ML
model include EDA techniques, such as statistical modeling and data visualization. These
remain helpful in data analysis and interpretation, identification of structure and assessment
of the data’s fitness for the predictive models. The analysis of EDA methods which include
statistical tests and visualization techniques assist researchers in finding new relationships to
improve the effectiveness and accuracy of the developed predictive models.

5
CET313 || Artificial Intelligence

In another study by Ahmed et al., (2023), the various groups of machine learning techniques
could accurately predict breast cancer. They employed a number of general methods of
Machine Learning, or algorithm such as Logistic Regression, Naive Bayes, Support Vector
Machine, Random Forest, K-nearest Neighbor, and neural networks. In their work, they
devoted a lot of attention to feature selection and feature extraction They also found that the
further advanced algorithms, such as XGBoost and Random Forest are very stable and
suitable for high-dimensional data. Feature selection was used alongside hyperparameter
tuning to mitigate for the issues arising from model overfitting.

Along the same vein, in Gupta and Sharma, (2022) different feature selection techniques
were applied, in this case, the RFE technique was merged with the common and ML
methods. This study noted that there is always the improvement of different hyperparameters
using the grid search and random search increases the model capability. They also stressed on
cross-validation for checking the stability of the model and avoiding overfitting of the data.
Several techniques such as L1 and L2 were also noted to enhance the stability of the models’
performance.

Support vector machine was also mentioned in the case presented by Johnson et al. (2022) in
relation to logistic regression, which is applied in binary classification problems. What they
discovered was that when applying logistic regression by a sigmoid function, differentiation
between malignant and benign tumors was possible. This model is preferred for its simplicity
as well as ease in interpretation of its results. They also used k-fold cross-validation to test
the above models- logistic regression and SVM and found that the models worked equally
well for all the data.

From the research which has been conducted by Lee et al. (2021), it is evident that the kernel
functions of the SVMs have applied classification of breast cancer in this study. They noted
that the changing of kernel parameters at the beginning of the learning process is very
important for maximization of learning. From their study they found out that SVMs with
kernel functions could deal with a large amount of data and classify them with a very good
accuracy.

ML and DL, in general, have improved in the detection, diagnosis, as well as prognosis of
breast cancer on its early stage. Since the employment of these techniques has occurred, it has
been efficient in improving diagnostic precision and therefore patients’ outcomes. However,

6
CET313 || Artificial Intelligence

before applying the ML algorithm, the steps like statistical analysis like exploratory data
analysis (EDA), Data visualization are helpful to understand the data, patterns into data, and
suitability of data for prediction models.

Gupta et al. (2021) provided a broader analysis of different ML models including logistic
regression, support vector machines (SVM), random forests, as well as the deep learning
approaches are used to predict breast cancer. The study also pointed out the fact that CNN
under the deep learning setup outperformed the traditional ML techniques in the classification
of images of breast cancer.

Another significant piece of work by Singh and Sharma (2022) addressed the feature
selection method which include PCA and RFE as most relevant methods for improving the
model’s performance. They showed that the augmentation of these techniques led to
improvement of the certainty of the SVM and random forest classifiers for breast cancer
diagnosis.

Thanks to its interpretability and versatility, logistic regression, a regularly selected choice
for binary classification, has successfully begun predicting breast cancer. Patel et al. (2021)
investigated logistic regression models achieved through L1 and L2 regularization techniques
and found they yield better performance and are easier to interpret.

It is now confirmed that Random Forests along with XGBoost do remarkably well,
particularly with large datasets rich in features. Kumar and Verma (2023) affirm that these
models function at a superior degree regarding accuracy and robustness, mostly during the
process of hyperparameter tuning.

Utilization of deep learning models for breast cancer image analysis has become extensive,
especially with CNNs. The work of Lee et al. (2022) found that CNNs are adept at learning
features from mammographic images and achieving greater diagnostic accuracy than usual
ML methods.

Bearing in mind the innovations, the problems of data imbalance, model overfitting, and the
need for a comprehensive collection of labelled datasets have yet to go away. Upcoming
studies require focus on creating models that adequately and extensively deal with these types
of challenges.

7
CET313 || Artificial Intelligence

Findings demonstrate that design choices affect accuracy, sensitivity, and specificity
depending on the dataset and the way features are chosen. For example:

 SVM and Random Forest models have shown high accuracy in classifying breast
cancer cases (Gupta & Sharma, 2022).

 XGBoost has been notable for handling large datasets with high-dimensional features
effectively (Ahmed et al., 2023).

 Artificial Neural Networks (ANNs) excel in learning complex patterns from

imaging data, enhancing diagnostic accuracy (Lee et al., 2021).

Methodology

The detection and prevention of breast cancer using machine learning are part of a triad
methodology including the installation and incorporation of libraries. The process starts by
integrating and absorbing libraries; numPPy processes numerical computations on arrays and
Pandas helps with analytical work and data alteration through DataFrames. Matplotlib and
Seaborn are available for visualizations, which additionally support the recognition of data
patterns and its network of relationships. In the field of data preprocessing and feature
scaling, Scikit-learn’s tools are at our disposal, they also support hyperparameters tuning
using GridSearchCV.

In machine learning classification, the algorithms featured are; Logistic Regression, Naïve
Bayes, Support Vector Machine (SVM), Random Forest, and K-Nearest Neighbor (KNN).
Examining the selection of these models permits us to study the variation in the method
applied for their classification. Looking at this, accuracies, confusion matrices and
classification reports are capable of assessing the performance of the model created.

The Scikit-learn Pipeline seeks to apply like for like transformations of data and modeling
steps for training and testing. In addition to the previously explained ML algorithms, the
enhanced prototype of XGBoost recognized as XGBClassifier is rapid and performs
effectively on large datasets. Thanks to that support, the emphasis of TensorFlow is on deep
learning, which allows the generation of novel neural networks and their training utilising
sophisticated methods. The design of artificial neural networks incorporates the Sequential
model from Keras and an ADAM optimizer that speeds up their training substantially. This

8
CET313 || Artificial Intelligence

integration enables the sector focused on breast cancer detection and prevention to earn from
the most powerful and precise ways derived from machine learning.

During the process of developing this model to predict breast cancer, I relied on the
assistance of a few different libraries. The following is a list of the libraries that I have
utilized recently:

9
CET313 || Artificial Intelligence

Data Collection

The dataset I have used to train the model is Breast Cancer Wisconsin Diagnostic dataset.
This dataset can also be accessed via UCI machine learning repository and UW CS ftp server.
I downloaded the dataset and used panda’s library to read the dataset. The dataset contains
569 rows and 33 columns and also contains null values which will later be removed during
data preprocessing. I have also displayed the first 5 and last 5 rows containing values.

10
CET313 || Artificial Intelligence

11
CET313 || Artificial Intelligence

Data Pre-processing and Visualization

Statistical description of our dataset will help us to understand our dataset better. The dataset
I have used contains empty or unwanted values which will be removed for a cleaner and
refined data. The dataset I have used contains 357 (62.74%) benign cases and 212 (37.26%)
malignant cases. I have presented this data with the help of pie chart.

The diagnosis based on the features or variables in categorized into ‘M’ and ‘B’ where ‘M’
represents malignant and ‘B’ represents benign. These two categories are later converted into
numerical data where ‘B’ equals to 0 and ‘M’ equals to 1. I have I cleansed the dataset to
obtain refined data. I have also saved the updated data in a new csv file for further use. I have
also analyzed the correlation between variables to see how variables influence diagnosis. I
have visualized the correlation using Pari plot and Heatmap.

12
CET313 || Artificial Intelligence

Based
on the dataset I have extracted some valuable information about potential factors related to
breast cancer. I have gleaned important information regarding breast cancer risk variables
from the dataset. With 569 observations, the dataset is a trustworthy sample for analysis.
Crucially, there are no missing values, guaranteeing the reliability and completeness of the
dataset. With averages of 14.13, 19.29, and 91.97, respectively, `radius_mean`,
`texture_mean`, and `perimeter_mean` stand out among the essential features. These
characteristics draw attention to significant differences in cellular architecture, which are
essential for differentiating between benign and malignant situations.

13
CET313 || Artificial Intelligence

The dataset exhibits persistent patterns, as evidenced by the low standard deviations of
features like `smoothness_mean`, `compactness_mean`, and `concavity_mean`. These
characteristics are excellent candidates for predictive modeling because of their consistency.
Conversely, "worst-case" metrics, including `radius_worst` (mean = 16.26) and
`concavity_worst` (mean = 0.272), show greater average values, indicating their importance
in detecting malignant situations.

Numerous cellular abnormalities are captured in the dataset, including `area_worst` (mean =
880.58) and `texture_worst` (mean = 25.67), which emphasize extremes that frequently
correspond with malignancy. The dataset's emphasis on cellular traits guarantees its resilience
for breast cancer diagnosis, even while demographic information such as age or gender is not
specifically given.

All things considered, the dataset is perfect for creating machine learning models due to its
consistency, completeness, and diversity. These revelations about important biological traits
highlight the dataset's potential to make a substantial contribution to breast cancer prevention
and early detection methods.

Missing Values

There are no missing values. None of the features, including `radius_mean`, `texture_mean`,
`perimeter_mean`, `area_mean`, and other measurements of cellular properties, have any
missing values. There were 569 entries in the dataset at first, but one column (`Unnamed:
32`) was eliminated because it provided no useful information. The dataset is intact after
preprocessing, containing 569 complete and reliable records, increasing its analytical
reliability.

14
CET313 || Artificial Intelligence

Building successful machine learning models is made possible by this clean dataset, which
guarantees that bias brought about by missing data is eliminated. Additionally, the dataset is
made simpler by eliminating superfluous columns, allowing only significant attributes to be
highlighted. This degree of data integrity guarantees that the analysis is founded on correct
and comprehensive information and enhances the results' trustworthiness.

15
CET313 || Artificial Intelligence

Correlation Heatmap

The correlation heatmap reveals multiple positive associations between "BREAST

CANCER" and particular attributes. Critical attributes like "radius_mean,"
"perimeter_mean," "area_mean," "concavity_mean," and "concave points_mean" exhibit
significant relationships with breast cancer diagnosis. The correlation coefficients for these
factors are 0.73, 0.74, 0.71, 0.69, and 0.78, respectively, signifying a substantial connection
with the probability of breast cancer diagnosis.

Furthermore, "compactness_mean" and "concavity_mean" have a significant association of

0.66, indicating that these traits frequently co-occur in patients diagnosed with breast cancer.
A significant correlation exists between "radius_mean" and "area_mean," with a coefficient
of 0.99, underscoring their strong link in breast cancer cases.

16
CET313 || Artificial Intelligence

Notably, diminished correlations are noted for attributes like "texture_mean" and
"symmetry_mean," with coefficients of 0.32 and 0.15, signifying reduced associations with
breast cancer diagnosis. These discoveries underscore that attributes pertaining to size and
shape (such as radius, perimeter, and area) are pivotal in breast cancer prediction, in contrast
to attributes linked to texture or symmetry.

This feautres had a corralation valus < 0.07 with the target columns
fractal_dimension_mean / texture_se / smoothness_se / symmetry_se / fractal_dimension_se

17
CET313 || Artificial Intelligence

18
CET313 || Artificial Intelligence

19
CET313 || Artificial Intelligence

20
CET313 || Artificial Intelligence

21
CET313 || Artificial Intelligence

Data Preparation

(Splitting data into training and testing data) Prior to model training I have divided the
data into training and testing data. 75% of the data from the dataset will be used for training
the model and 25% of the data will be used for testing the model. While splitting the data I
have specified random state so that every time same data will be used for training and testing
the model.

Building Model Training

Logistic Regression, Decision Tree Classifier and Random Forest Classifier mostly used
algorithms for predicting categorical value. They are all efficient classification algorithms.

Training model using Logistic Regression

This code illustrates the application of Logistic Regression for predicting breast cancer. The
model is trained using labeled data (`x_train`, `y_train`) and evaluated on unseen data
(`x_test`). The confusion matrix indicates 101 true negatives, 52 true positives, 1 false

22
CET313 || Artificial Intelligence

positive, and 2 false negatives. The accuracy score of 98.07% underscores the model's
efficacy in accurately categorizing malignant and benign cases, establishing it as a
dependable instrument for breast cancer detection and analysis.
Other Models Used:
Alongside Logistic Regression, I employed various additional models to evaluate their
performance on the same dataset. This method facilitates an accurate evaluation and enables
the selection of the optimal model for the task at hand. The models I utilized are as follows:

K Nearest Neighbors Model

This code uses the K-Nearest Neighbors (KNN) algorithm to determine the appropriate
quantity of neighbors for breast cancer prediction. It traverses a spectrum of neighbor values
(1 to 4) and computes the accuracy score for each utilizing the Minkowski distance metric.
The accuracy scores are recorded and graphed to determine the value of `n_neighbors` that
yields the maximum predictive accuracy. This aids in refining the KNN model for maximal
classification efficiency.

23
CET313 || Artificial Intelligence

Support Vector Machine Model

The code assesses the efficacy of an SVM model utilizing an RBF kernel for breast cancer
prediction through the optimization of the regularization parameter `C`. The model is trained
on the training dataset and evaluated on the test dataset for various `C` values `[0.5, 0.6, 0.7,
0.8, 0.9, 1.0]`. The accuracy for each `C` is computed and illustrated to demonstrate the
influence of `C` on the model's performance, facilitating the identification of the ideal
parameter for precise predictions

24
CET313 || Artificial Intelligence

Decision Tree Classifier

The provided code determines the ideal quantity of `max_leaf_nodes` for a Decision Tree
Classifier to enhance its efficacy. The classifier is trained iteratively with varying values of
`max_leaf_nodes` from 2 to 14, and the model is assessed using the accuracy score. The
model is trained on `x_train` and `y_train` for each value, with predictions generated on
`x_test`. The accuracy scores are recorded in a list and subsequently displayed against the
corresponding values of `max_leaf_nodes` to illustrate the variation in accuracy, aiding in the
identification of the ideal parameter.

25
CET313 || Artificial Intelligence

26
CET313 || Artificial Intelligence

Random Forest Classification Model

The provided function assesses the ideal quantity of `n_estimators` (trees) for a Random
Forest Classifier. The classifier is trained iteratively with `n_estimators` values varying from
10 to 29. The model is trained on `x_train` and `y_train` for each value, with predictions
generated on `x_test`. The accuracy score for each prediction is computed and recorded in a
list. The accuracy scores are displayed against the `n_estimators` values to illustrate the effect
of the number of trees on model performance, facilitating the selection of the optimal
`n_estimators` for the classifier.

27
CET313 || Artificial Intelligence

ANN (Neural Network)

In order to categorize binary outcomes, this code creates an Artificial Neural Network (ANN)
by making use of the Keras module that is available in TensorFlow. A non-linearity
activation function known as ReLU is used in each of the four hidden layers of the artificial
neural network (ANN), which has been initialized and comprises of seven neurons apiece. A
single neuron with a sigmoid activation function is present in the output layer, making it
appropriate for binary classification analysis. Additionally, the Adam optimizer, binary cross-
entropy loss (which is utilized for binary classification), and accuracy as the evaluation
metric are utilized in the compilation of the model. At last, the artificial neural network
(ANN) is trained using the training dataset (`x_train`, `y_train`) with a batch size of 16 for a

total of 100 epochs.

28
CET313 || Artificial Intelligence

29
CET313 || Artificial Intelligence

30
CET313 || Artificial Intelligence

XGBOOST

This code evaluates the performance of an XGBoost classifier (`XGBClassifier`) by altering

the number of estimators (`n_estimators`) between 10 and 29. For every value of
`n_estimators`, the model is trained on `x_train` and `y_train` with a maximum tree depth of
12 and a subsample ratio of 0.7. Predictions are generated on `x_test`, and the accuracy
scores are computed and recorded in a list. Ultimately, the accuracy scores are graphed
against the `n_estimators` values to evaluate their influence on model performance,
facilitating the determination of the best number of estimators.

31
CET313 || Artificial Intelligence

CAT BOOST

This code illustrates the application of the `CatBoostClassifier` from the CatBoost library for
training a classification model. The classifier is initialized with default parameters and trained
on the dataset (`x_train`, `y_train`) via the `fit` method. The training process produces the
learning rate and metrics, including the learning loss for each iteration, as well as timing
information. CatBoost is exceptionally effective in managing categorical data and attaining
superior performance with less parameter adjustment. The output elucidates the model's
learning process with each iteration.

32
CET313 || Artificial Intelligence

The code constructs a panda DataFrame to evaluate the efficacy of several machine learning
models. The `Model` column lists the names of many models, including Support Vector
Machines (SVM), K-Nearest Neighbors (KNN), Logistic Regression, Random Forest,
Artificial Neural Networks (ANN), Decision Tree, XGBoost, and CatBoost. The `Score`
column includes the accuracy scores associated with each model (e.g., `acc_svc`, `acc_knn`,
etc.).
The DataFrame is subsequently arranged in descending order based on the `Score` column
utilizing the `sort_values` technique, so facilitating the identification of the model with the
highest performance at the forefront. This facilitates a straightforward comparison of the
models' accuracies.

33
CET313 || Artificial Intelligence

Model Optimization

(Hyper parameter tuning) It is important to optimize the model because that way we can
use the best possible model for our prototype. Models can be optimized by tuning their hyper
parameters. Hyper parameter tuning is the process of selecting best values of the model’s
input parameters or hyper parameters which will give us an optimized model.Grid Seach
technique is used via GridSearchCV available in scikit learn library. GridSearchCV performs
hyper parameter tuning and also applies cross-validation.

34
CET313 || Artificial Intelligence

Outcomes and Analysis

Among the assessed classification models for breast cancer diagnosis, Logistic Regression
had superior performance, with an accuracy of 98.08%. Support Vector Machines (SVM)
achieved an accuracy of 96.79%, demonstrating its robust prediction capabilities. The
Random Forest and K-Nearest Neighbors (KNN) models exhibited strong performance,
achieving accuracies of 94.23%. Likewise, the CatBoost and XGBoost classifiers achieved
competitive accuracies of 94.87% and 95.51%, respectively, demonstrating their efficacy in
managing intricate data patterns. The Artificial Neural Network (ANN) demonstrated
consistent performance, with an accuracy of 95.51%.

Conversely, the Decision Tree model attained the lowest accuracy at 88.46%, underscoring
its deficiencies relative to ensemble and advanced techniques. The Logistic Regression model
is the most appropriate for this breast cancer classification task because to its exceptional
accuracy. Models like as SVM, ANN, and XGBoost yield favorable outcomes and may be
regarded as robust alternatives. Additional enhancements can be realized through
hyperparameter optimization, the exploration of supplementary features, or the application of
ensemble methods to augment the predictive efficacy of these models.

Conclusion
This study assessed multiple machine learning models to predict breast cancer, aiming to
determine the most precise and dependable classifier. Logistic Regression was the most
effective model, achieving an accuracy of 98.08%, followed by Support Vector Machines at
96.79%, and ensemble methods such as Random Forest at 94.23% and XGBoost at 95.51%.
Advanced methodologies, including ANN and CatBoost, exhibited robust predictive efficacy,
achieving accuracies of 94%.
The results demonstrate that both conventional models, including Logistic Regression and
SVM, as well as sophisticated techniques, such as XGBoost and ANN, are efficacious for
breast cancer prediction. Nevertheless, simpler models such as Logistic Regression may be
preferred for their interpretability and implementation effectiveness.

Future endeavors may encompass the refinement of these models, the incorporation of
supplementary variables, and the investigation of ensemble methodologies to improve
forecast precision and resilience. This investigation underscores the promise of machine
learning in medical diagnostics, facilitating the development of more precise and automated
systems for breast cancer diagnosis.

35
CET313 || Artificial Intelligence

References

Kavitha, R., Arivazhagan, D. and Amuthan, A. (2022) ‘Breast cancer prediction using
optimized machine learning algorithms and explainable AI techniques’, Journal of
Medical Imaging and Health Informatics, 12(3), pp. 567–576.
doi:10.1166/jmihi.2022.3776.

Mohapatra, S., Sabut, S. and Kandar, D. (2021) ‘Prediction of breast cancer using
hybrid machine learning techniques: A comprehensive approach’, 2021 International
Conference on Computational Intelligence and Data Science (ICCIDS) [Preprint].
doi:10.1016/j.procs.2021.02.104.

Sharma, A., Aggarwal, R.K. and Chawla, P. (2020) ‘Breast cancer detection using
adaptive ensemble learning techniques’, Neural Computing and Applications, 32(7),
pp. 3145–3157. doi:10.1007/s00521-019-04313-6.

Singh, G., Gupta, P.K. and Agarwal, D. (2019) ‘A comparative analysis of machine
learning techniques for breast cancer detection and diagnosis’, International Journal of
Advanced Research in Computer Science, 10(5), pp. 23–30.
doi:10.26483/ijarcs.v10i5.6487.

Shao, W., Cao, L. and Liu, X. (2021) ‘Deep learning-based early detection of breast
cancer: A novel approach using mammographic images’, Proceedings of 2021
International Conference on Biomedical Engineering and AI [Preprint].
doi:10.1109/icbeai.2021.9654137.

Kumar, A., Tyagi, A. and Kumar, D. (2020) ‘Predictive modeling for breast cancer
classification using machine learning algorithms and feature selection techniques’,
Biomedical Signal Processing and Control, 62, pp. 102083.
doi:10.1016/j.bspc.2020.102083.

Project Report On Breast Cancer
100% (2)
Project Report On Breast Cancer
47 pages
Breast Cancer Prediction Using Machine Learning
No ratings yet
Breast Cancer Prediction Using Machine Learning
8 pages
Journal-Breast Cancer Prediction
No ratings yet
Journal-Breast Cancer Prediction
10 pages
Breast Cancer Aiml Project
No ratings yet
Breast Cancer Aiml Project
25 pages
Yuuy
No ratings yet
Yuuy
5 pages
CHAPTER ONE to 3-1
No ratings yet
CHAPTER ONE to 3-1
51 pages
Chapter One to Three
No ratings yet
Chapter One to Three
39 pages
2019-05 Machine Learning Techniques For Detecting and Predicting Breast Cancer
No ratings yet
2019-05 Machine Learning Techniques For Detecting and Predicting Breast Cancer
5 pages
Neural Network
No ratings yet
Neural Network
15 pages
(IJCST-V11I3P3) :DR M Narendra, A Nandini, T Kamal Raj, V Sai Sowmya, CH Brahma Reddy
No ratings yet
(IJCST-V11I3P3) :DR M Narendra, A Nandini, T Kamal Raj, V Sai Sowmya, CH Brahma Reddy
3 pages
Feature Selection For Breast Cancer Detection Using Machine Learning Algorithms
No ratings yet
Feature Selection For Breast Cancer Detection Using Machine Learning Algorithms
4 pages
Exploring_Machine_Learning_Classifiers_f
No ratings yet
Exploring_Machine_Learning_Classifiers_f
21 pages
Classification_of_Breast_Cancer_using_a_Novel_Neural_Network-based_Architecture
No ratings yet
Classification_of_Breast_Cancer_using_a_Novel_Neural_Network-based_Architecture
6 pages
Utilizing Cutting-Edge Machine Learning Methods fo_241221_101813 paper
No ratings yet
Utilizing Cutting-Edge Machine Learning Methods fo_241221_101813 paper
7 pages
1 s2.0 S1877050923001102 Main
No ratings yet
1 s2.0 S1877050923001102 Main
7 pages
Machine Learning Algorithms For Breast Cancer Analysis: Performance and Accuracy Comparison
No ratings yet
Machine Learning Algorithms For Breast Cancer Analysis: Performance and Accuracy Comparison
8 pages
How Can Machine Learning Be Used To Classify Breast Cancer?
No ratings yet
How Can Machine Learning Be Used To Classify Breast Cancer?
6 pages
SURVEY OF BREAST CANCER USING MACHINE LEARNING
No ratings yet
SURVEY OF BREAST CANCER USING MACHINE LEARNING
8 pages
Breast Cancer Diagnosis
No ratings yet
Breast Cancer Diagnosis
31 pages
Research Proposal UK
No ratings yet
Research Proposal UK
13 pages
Enhancing Breast Cancer Diagnosis: A Comparative Analysis of Feature Selection Techniques
No ratings yet
Enhancing Breast Cancer Diagnosis: A Comparative Analysis of Feature Selection Techniques
11 pages
A-14 Mini Project Abstract
No ratings yet
A-14 Mini Project Abstract
15 pages
Justification of the Research Proposed
No ratings yet
Justification of the Research Proposed
22 pages
Breast Cacner Detection
No ratings yet
Breast Cacner Detection
6 pages
Comparative Analysis of Breast Cancer Detection Using Cutting-Edge Machine Learning Algorithms (MLAs)
No ratings yet
Comparative Analysis of Breast Cancer Detection Using Cutting-Edge Machine Learning Algorithms (MLAs)
15 pages
Zeroth Review Minor P
No ratings yet
Zeroth Review Minor P
11 pages
Brest Cancer Tumor Detection
No ratings yet
Brest Cancer Tumor Detection
40 pages
A Novel SVM Kernel Classifier Technique Using Supp
No ratings yet
A Novel SVM Kernel Classifier Technique Using Supp
19 pages
BCPUML Breast Cancer Prediction Using Machine Learning Approach—a Performance Analysis
No ratings yet
BCPUML Breast Cancer Prediction Using Machine Learning Approach—a Performance Analysis
10 pages
Ankita Patra
No ratings yet
Ankita Patra
17 pages
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
No ratings yet
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
6 pages
Breast Cancer Prediction Using Machine Learning: Article
No ratings yet
Breast Cancer Prediction Using Machine Learning: Article
13 pages
Research Paper Diagnosis
No ratings yet
Research Paper Diagnosis
10 pages
Report of Breast Cancer
No ratings yet
Report of Breast Cancer
80 pages
Project Report: Bangladesh University of Business & Technology (BUBT)
No ratings yet
Project Report: Bangladesh University of Business & Technology (BUBT)
18 pages
T212033 - Prachi Ratilal Patil
No ratings yet
T212033 - Prachi Ratilal Patil
28 pages
A Study On Early Prevention and Detection of Breast Cancer
No ratings yet
A Study On Early Prevention and Detection of Breast Cancer
7 pages
Machine Learning Algorithms For Breast Cancer Prediction and Diagnosis Machine Learning Algorithms For Breast Cancer Prediction and Diagnosis
No ratings yet
Machine Learning Algorithms For Breast Cancer Prediction and Diagnosis Machine Learning Algorithms For Breast Cancer Prediction and Diagnosis
6 pages
Analysis of Machine Learning Algorithms On Cancer Dataset
No ratings yet
Analysis of Machine Learning Algorithms On Cancer Dataset
10 pages
A Homogeneous Ensemble Classifier For Breast Cancer Detection Using Parameters Tuning of MLP Neural
No ratings yet
A Homogeneous Ensemble Classifier For Breast Cancer Detection Using Parameters Tuning of MLP Neural
22 pages
Research Paper Final
No ratings yet
Research Paper Final
11 pages
IRJMETS51200105224
No ratings yet
IRJMETS51200105224
5 pages
Disease Presiction
No ratings yet
Disease Presiction
32 pages
Breast Cancer Survival Prediction Using Machine Learning
No ratings yet
Breast Cancer Survival Prediction Using Machine Learning
7 pages
Breast Cancer Diagnostiic Using Machine Learning
No ratings yet
Breast Cancer Diagnostiic Using Machine Learning
72 pages
12
No ratings yet
12
17 pages
Yousefi Arzyabiamalkard12
No ratings yet
Yousefi Arzyabiamalkard12
5 pages
Survey on Supervised Machine Learning in the Diagnosis and Detection of Breast Cancer STA
No ratings yet
Survey on Supervised Machine Learning in the Diagnosis and Detection of Breast Cancer STA
9 pages
(IJCST-V7I4P8) : Nitasha
No ratings yet
(IJCST-V7I4P8) : Nitasha
4 pages
Breast Cancer Detection Using Machine Learning
No ratings yet
Breast Cancer Detection Using Machine Learning
6 pages
Proposal PDF
No ratings yet
Proposal PDF
16 pages
Research Paper 1
No ratings yet
Research Paper 1
9 pages
Breast Cancer Prediction Model With Decision Tree and Adaptive Boosting
No ratings yet
Breast Cancer Prediction Model With Decision Tree and Adaptive Boosting
7 pages
SELF: A Stacked Based Ensemble Learning Framework For Breast Cancer Classification
No ratings yet
SELF: A Stacked Based Ensemble Learning Framework For Breast Cancer Classification
16 pages
Breast Cancer
No ratings yet
Breast Cancer
20 pages
Cancers 14 06015 v2
No ratings yet
Cancers 14 06015 v2
18 pages
Advanced Analytics of Image Datasets in Human Health
From Everand
Advanced Analytics of Image Datasets in Human Health
Dr. Zemelak Goraga
No ratings yet
Smart Business Problems and Analytical Hints in Cancer Research
From Everand
Smart Business Problems and Analytical Hints in Cancer Research
Zemelak Goraga
No ratings yet
Clinical Decision Support System: Fundamentals and Applications
From Everand
Clinical Decision Support System: Fundamentals and Applications
Fouad Sabry
5/5 (1)
Cancer Registry Manager - The Comprehensive Guide: Vanguard Professionals
From Everand
Cancer Registry Manager - The Comprehensive Guide: Vanguard Professionals
Viruti Shivan
No ratings yet
Ultra-high-speed protection of transmission lines using traveling wave theory
No ratings yet
Ultra-high-speed protection of transmission lines using traveling wave theory
10 pages
Ultimate Data Science _ GenAI Bootcamp
No ratings yet
Ultimate Data Science _ GenAI Bootcamp
34 pages
1-5-18 M Tech CSE Batch 2018 PDF
No ratings yet
1-5-18 M Tech CSE Batch 2018 PDF
80 pages
Codes and Concepts of ML-Developer
No ratings yet
Codes and Concepts of ML-Developer
125 pages
Prediction of Diabetes Using Machine Learning Techniques
No ratings yet
Prediction of Diabetes Using Machine Learning Techniques
10 pages
ABSTRACT
No ratings yet
ABSTRACT
49 pages
M.SC Thesis in Economics
100% (3)
M.SC Thesis in Economics
6 pages
(Ebook) Applications in Electronics Pervading Industry, Environment and Society: APPLEPIES 2014 by Alessandro De Gloria (eds.) ISBN 9783319202266, 331920226X all chapter instant download
100% (17)
(Ebook) Applications in Electronics Pervading Industry, Environment and Society: APPLEPIES 2014 by Alessandro De Gloria (eds.) ISBN 9783319202266, 331920226X all chapter instant download
67 pages
SVM & CNN
No ratings yet
SVM & CNN
62 pages
HFU machine learning
No ratings yet
HFU machine learning
16 pages
Buy ebook An introduction to IoT Analytics 1st Edition Harry G Perros cheap price
100% (4)
Buy ebook An introduction to IoT Analytics 1st Edition Harry G Perros cheap price
65 pages
Deep Learning Overview
No ratings yet
Deep Learning Overview
102 pages
Malicious Application Detection Using Machine Learning
No ratings yet
Malicious Application Detection Using Machine Learning
59 pages
Data Mining Techniques and Its Applications in Banking Section - Chitra and Subashini
No ratings yet
Data Mining Techniques and Its Applications in Banking Section - Chitra and Subashini
8 pages
BERT A Review of Applications in Sentiment Analysis
No ratings yet
BERT A Review of Applications in Sentiment Analysis
10 pages
Project Report
No ratings yet
Project Report
39 pages
8478457
No ratings yet
8478457
13 pages
Webtoc
No ratings yet
Webtoc
56 pages
Audio Deepfake Detection Paper
No ratings yet
Audio Deepfake Detection Paper
6 pages
Da Programs
No ratings yet
Da Programs
10 pages
Personality Classification From Online Text
No ratings yet
Personality Classification From Online Text
17 pages
Assignment 4 Proposal Presentation
No ratings yet
Assignment 4 Proposal Presentation
13 pages
MINI PROJECT PHISHING WEBSITE DETECTION USING ML
No ratings yet
MINI PROJECT PHISHING WEBSITE DETECTION USING ML
45 pages
Project Report
No ratings yet
Project Report
29 pages
..... The Purchasing Behavior of Generations X y and Z
No ratings yet
..... The Purchasing Behavior of Generations X y and Z
33 pages
Intro To ML PDF
No ratings yet
Intro To ML PDF
66 pages
FAI Lecture - 4-10-2023 PDF
No ratings yet
FAI Lecture - 4-10-2023 PDF
27 pages
217-Article Text-522-579-10-20221218
No ratings yet
217-Article Text-522-579-10-20221218
7 pages
Unit 3 - Decision Making under Uncertainty in AI
No ratings yet
Unit 3 - Decision Making under Uncertainty in AI
25 pages

Breast Cancer Prediction Model Assignment

Uploaded by

Breast Cancer Prediction Model Assignment

Uploaded by

CET313 || Artificial Intelligen

Breast Cancer Detection and Prevention Using Machine Learning

Keywords: breast cancer; machine learning; prognosis & diagnosis

I consider it a challenge as well as a benefit to have worked on breast cancer prediction by

 Artificial Neural Networks (ANNs) excel in learning complex patterns from

Data Pre-processing and Visualization

The correlation heatmap reveals multiple positive associations between "BREAST

Furthermore, "compactness_mean" and "concavity_mean" have a significant association of

Building Model Training

Training model using Logistic Regression

K Nearest Neighbors Model

Support Vector Machine Model

Decision Tree Classifier

Random Forest Classification Model

ANN (Neural Network)

total of 100 epochs.

This code evaluates the performance of an XGBoost classifier (`XGBClassifier`) by altering

Outcomes and Analysis

You might also like