0% found this document useful (0 votes)

12 views39 pages

Chapter One To Three

The document discusses the development of machine learning models for breast cancer prediction, highlighting the importance of early detection and the potential of various algorithms like logistic regression and artificial neural networks. It outlines the methodology for data collection, preprocessing, and model evaluation, aiming to improve patient outcomes and reduce healthcare costs. Additionally, it emphasizes the significance of ethical considerations in the development of predictive models and the contributions to knowledge in breast cancer diagnosis and risk assessment.

Uploaded by

vencedorabiodun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views39 pages

Chapter One To Three

Uploaded by

vencedorabiodun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 39

CHAPTER ONE

INTRODUCTION
Breast cancer is a complex disease that arises from a combination of genetic and environmental

factors. It is the second most common cancer in women worldwide and is responsible for a

significant number of cancer-related deaths. Early detection and accurate prediction of breast

cancer can significantly improve patient outcomes by enabling timely treatment and reducing the

risk of disease progression.

Machine learning algorithms have shown great potential in breast cancer prediction by analyzing

large datasets of patient information. These algorithms can identify patterns in the data that are

difficult for humans to detect, allowing for more accurate prediction of breast cancer risk.

Machine learning models can also be trained on mammography images to detect abnormalities

that may be indicative of breast cancer.

There are several machine learning algorithms that can be used for breast cancer prediction,

including logistic regression, decision trees, random forests, support vector machines, and

artificial neural networks. Each algorithm has its own strengths and weaknesses, and the choice

of algorithm depends on the characteristics of the dataset and the specific requirements of the

problem.

In this project, we propose to develop a machine learning model for breast cancer prediction and

compare the performance of different algorithms. We will collect breast cancer patient data from

publicly available databases and preprocess the data to remove missing values, outliers, and

irrelevant features. We will use feature selection techniques to identify the most relevant features

for breast cancer prediction and develop machine learning models using different algorithms. We

1
will evaluate the performance of the developed models using performance metrics such as

accuracy, precision, recall, and F1-score. Finally, we will compare the performance of the

developed models with existing models using statistical tests.

The results of this project will provide insights into the effectiveness of different machine

learning algorithms for breast cancer prediction and may lead to the development of more

accurate and reliable prediction models.

1.1 BACKGROUND OF THE STUDY

Breast cancer is a major public health concern worldwide, with approximately 2.3 million new

cases diagnosed annually and around 685,000 deaths reported each year. Early detection and

accurate prediction of breast cancer can significantly improve patient outcomes and reduce

mortality rates. Machine learning algorithms have shown great potential in breast cancer

prediction by analyzing large datasets of patient information. Several studies have been

conducted to develop machine learning models for breast cancer prediction and compare the

performance of different algorithms.

One study by Alipourfard et al. (2020) compared the performance of logistic regression, decision

trees, random forests, support vector machines, and artificial neural networks in breast cancer

prediction using the Wisconsin Breast Cancer Dataset. The study found that artificial neural

networks had the highest accuracy and F1-score, followed by support vector machines and

random forests.

Another study by Naseem et al. (2020) compared the performance of logistic regression, decision

trees, random forests, and artificial neural networks in breast cancer detection using

mammography images. The study found that artificial neural networks had the highest accuracy

2
and sensitivity, followed by random forests and decision trees. Breast cancer epidemiology,

prevention, and pathology have been extensively studied, and several risk factors have been

identified, including age, family history, genetic mutations, reproductive history, and lifestyle

factors (Malvia et al., 2017).

1.2 PROBLEM STATEMENT

The primary challenge in this problem statement is to develop an accurate and reliable model

that can effectively predict breast cancer risk using relevant data sources. This requires collecting

and preprocessing large amounts of data from various sources such as medical records, genetic

information, and imaging data. Another challenge is to identify the most relevant features that

contribute to predicting breast cancer risk and selecting the appropriate machine learning

algorithm to achieve high accuracy in predictions. Additionally, the model needs to be validated

and tested on different datasets to ensure its generalizability and robustness. Finally, ethical

considerations must be taken into account in the development of such models, ensuring that

patient privacy and autonomy are preserved, and that the model is used in a responsible and

transparent manner.

Breast cancer is a significant public health issue, and early detection is critical for improving

patient outcomes. Machine learning can play a valuable role in predicting breast cancer risk and

aiding in the early detection of breast cancer. The development of accurate and reliable machine

learning models can help healthcare professionals make more informed decisions and provide

personalized treatment plans to patients based on their individual risk factors. Furthermore, the

use of machine learning can help reduce healthcare costs by identifying high-risk patients early

on and preventing the need for more expensive and invasive procedures in the later stages of the

disease. Therefore, the motivation behind the prediction of breast cancer using supervised

3
machine learning proposal with reference is to improve patient outcomes, reduce healthcare

costs, and ultimately, save lives.

1.3 AIM AND OBJEACTIVES OF THE STUDY

This study aimed to predict breast cancer using different machine learning approaches.

● The random forest (RF),

● gradient boosting trees (GBT),

were used in this study. Models were initially trained with demographic and laboratory features.

The models were then trained with all demographic, laboratory, and mammographic features to

measure the effectiveness of mammography features in predicting breast cancer.

1.4. METHODOLOGY
We obtained the breast cancer dataset from UCI (unique client identifier) repository and used

python as the platform for the purpose of coding.

methodology are classify into

i. (SVM): Support Vector Machine

ii. (K-NN): K-Nearest Neighbor

iii. (PCA): Principal Component Analysis.

A. Dimensionality Reduction: Dimensionality Reduction is used to get two dimensional data so

that better visualization of machine learning models can be done by plotting the prediction

regions and the prediction boundary for each model.

B. Feature Selection: Feature selection is finding the subset of original features by different

approaches based on the information they provide, accuracy, prediction errors.

4
C. Feature Projection: Feature projection is transformation of high-dimensional space data to a

lower dimensional space (with few attributes). Both linear and nonlinear reduction techniques

can be used in accordance with the type of relationships among the features in the dataset

D. Principal Component Analysis (PCA): PCA is an unsupervised linear dimensionality It is

used when we need to tackle the curse of dimensionality among data with linear relationships. It

is a linear technique which is used to compress lots of data into something which gives essence

of the original data

E. Model Selection: The most exciting phase in building any machine learning model is

selection of algorithm. We can use more than one kind of data mining techniques to large

datasets. But, at high level all those different algorithms can be classified in two groups:

supervised learning and unsupervised learning.

Supervised learning: is the method in which the machine is trained on the data which the input

and output are well labeled.

Unsupervised Learning :is giving away information to the machine that is neither classified

nor labeled and allowing the algorithm to analyze the given information without providing any

direction.

1.5 CONTRIBUTION TO THE KNOWLEDGE

The proposed project on "Application of Machine Learning in Breast Cancer Prediction and

Prognosis " has the potential to make significant contributions to the knowledge and

understanding of breast cancer diagnosis and risk prediction. By utilizing machine learning

techniques, we can develop a model that can predict the presence of breast cancer with high

accuracy, which can aid in early detection and timely treatment.

5
The project can also contribute to the development of new diagnostic tools and

personalized risk assessment strategies for breast cancer. By analyzing a large dataset of breast

cancer patients with various clinical and demographic features, we can identify new risk factors

and biomarkers that can be used to improve breast cancer screening and diagnosis.

Moreover, the project can also shed light on the effectiveness of different machine

learning algorithms for breast cancer prediction. By comparing and evaluating the performance

of various supervised learning algorithms such as logistic regression, decision trees, random

forests, and support vector machines, we can identify the most effective algorithm for breast

cancer prediction.

6
CHAPTER TWO

LITERATURE REVIEW
Breast cancer, a highly lethal and diverse disease in the current era, claims the lives of numerous

women globally. It stands as the most prevalent cancer among women, impacting approximately

10% of females at various life stages. Recent trends indicate a rising incidence rate, with a

reported 88% survival rate after five years and 80% after ten years from diagnosis. Early

detection is imperative in the monitoring process, given that breast cancer is the second leading

cause of female mortality after heart disease. The abnormal growth of fatty and fibrous breast

tissues serves as the precursor to this condition.

Tumors manifest as either benign, characterized by slow growth and lack of spread, or

malignant, exhibiting rapid growth, invasion of nearby tissues, and systemic dissemination.

These malignant tumors result from abnormal proliferation in the breast's fatty and fibrous

tissues, leading to different cancer stages (Noreen, Liu, Sha, & Ahmed, 2020).

Figure 2.1 illustrates the diverse types of breast cancer. Ductal Carcinoma in Situ (DCIS), a non-

invasive cancer, occurs when abnormal cells extend beyond the breast. Invasive Ductal

Carcinoma (IDC), also known as infiltrative ductal carcinoma, involves the widespread

distribution of abnormal breast cells. Mixed Tumors Breast Cancer (MTBC), or invasive

mammary breast cancer, arises from abnormal duct and lobular cells. Lobular Breast Cancer

(LBC), occurring within the lobule, elevates the risk of other invasive cancers. Mkagglenous

Breast Cancer (MBC), also known as colloid breast cancer, results from invasive ductal cells

spreading around the duct. Inflammatory Breast Cancer (IBC), the final type, induces breast

swelling and reddening, representing a fast-growing cancer stemming from lymph vessel

blockage and cell breakage.

7
Figure 2. 1: Major types of Breast Cancer

2.1.1 Signs and Symptoms of Breast Cancer

It is found that most women who have breast cancer symptoms and signs will initially notice

only one or two. Some people do not have any signs or symptoms at all. The most common signs

of breast cancer are:

• A lump or thickening in or near the breast or in the underarm (armpit) area;

• Enlarged lymph nodes in the armpit;

• Changes in size, shape, skin texture or color of the breast;

• Pain in any area of the breast;

• Skin redness;

• Dimpling or puckering;

• Fluid, other than breast milk, from the nipple, especially if it’s bloody;

8
• Scaly, red or swollen skin on the breast, nipple or areola (the dark area of skin that is

around the nipple);

Nipple pulling to one side or a change in direction;

2.1.2 Diagnosis of Breast Cancer

Breast cancer can be detected using one of the following methods.

Breast ultrasound: A machine that uses sound waves to make pictures, called sonograms, of

areas inside the breast

Diagnostic mammogram: If you have a problem in your breast, such as lumps, or if an area of

the breast looks abnormal on a screening mammogram, doctors may have you get a diagnostic

mammogram. This is a more detailed X-ray of the breast.

Breast magnetic resonance imaging (MRI): A kind of body scan that uses a magnet linked to a

computer. The MRI scan will make detailed pictures of areas inside the breast.

Biopsy: This is a test that removes tissue or fluid from the breast to be looked at under a

microscope and do more testing. There are different kinds of biopsies (for example, fine-needle

aspiration, core biopsy, or open biopsy)

Now as an innovation, we go in for a more accurate and effective way of detecting cancer, hence

the introduction of AI-based methods.

2.2 Overview on Artificial Intelligence and Benefits

2.2.1 Overview on Artificial Intelligence
Artificial intelligence (AI) is a branch of Computer Science. It involves developing computer

programs to complete tasks which would otherwise require human intelligence. AI algorithms

can tackle learning, perception, problem-solving language understanding and/or logical

9
reasoning. In AI we have machine learning and deep learning. Figure 1.2 shows the relationship

between AI, ML and DL

Figure 2. 2: Relationship between AI, ML and DL

2.2.2 Benefits of Artificial Intelligence

Broad areas in life are using AI in the various ways. AI and ML-powered software and devices

are mimicking human thought patterns to facilitate the digital transformation of society. AI

systems perceive their environment, deal with what they perceive, solve problems and act to help

with tasks to make everyday life easier. The following are ways in which AI has helped

revolutionize our lives:

• Voice Assistants: Digital assistants like Siri, Google Home, and Alexa use AI- backed

Voice User Interfaces (VUI) to process and decipher voice commands. AI gives these

applications the freedom to not solely rely on voice commands but also leverage vast

databases on cloud storage platforms.

10
• Entertainment Streaming Apps: Streaming giants like Netflix, Spotify, and Hulu are

continually feeding data into machine learning algorithms to make the user experience

seamless.

• Personalized Marketing: Brands use AI-driven personalization solutions based on

customer data to drive more engagement.

• Smart Input Keyboards: The latest versions of mobile keyboard apps combine the

provisions of autocorrection and language detection to provide a user-friendly experience.

• Navigation and Travel: The work of AI programmers behind navigation apps like

Google Maps and Waze never ends. Yottabytes of geographical data which is up- dated

every second can only be effectively cross-checked by ML algorithms un- leashed on

satellite images.

• Self-driving vehicles: The technology of Autonomous Vehicle AI is witnessing

largescale innovation driven by global corporate interest. AI is making innovations beyond

cruise-control and blind-spot detection to include fully autonomous capabilities.

• Security and Surveillance: It is nearly impossible for a human being to keep a constant

eye on too many monitors of a CCTV network at the same time. So, naturally, we have felt

the need to automate such surveillance tasks and further enhance them by leveraging

machine learning methodologies.

• Internet of Things: The confluence of AI and the Internet of Things (IoT) opens up a

plethora of opportunities to develop smarter home appliances that require minimal human

interference to operate. While IoT deals with devices interacting with the internet, the AI

part helps these devices to learn from data.

11
• Facial Recognition Technologies: The most popular application of this technology is in

the Face ID unlock feature in most of the flagship smartphone models today. The biggest

challenge faced by this technology is widespread concern around the racial and gender bias

of its use in forensics.

• Medicine: Artificially intelligent computer systems are used extensively in medical

sciences. Common applications include diagnosing patients, end-to-end drug discovery and

development, improving communication between physician and patient,

transcribing medical documents, such as prescriptions, and remotely treating patients.

2.3 Case Studies on Disease Prediction Models

The landscape of disease prediction models has witnessed remarkable advancements with the

integration of machine learning techniques. A seminal study conducted by Smith et al. (2018)

demonstrated the effectiveness of a predictive model in identifying early signs of cardiovascular

diseases using a diverse set of patient data. Leveraging a support vector machine algorithm, the

model showcased high accuracy in discerning patterns indicative of cardiovascular risks. This

underscores the potential of machine learning in contributing to the early diagnosis and

prevention of cardiovascular diseases. In a parallel effort, Jones and colleagues (2019) explored

the application of decision trees in predicting the onset of diabetes based on patient

demographics, lifestyle factors, and genetic markers. The decision tree algorithm exhibited

notable accuracy, shedding light on the intricate interplay of variables influencing diabetes risk.

This case study exemplifies the adaptability of machine learning approaches to diverse disease

domains, providing valuable insights into the nuanced factors contributing to disease

susceptibility.

12
Transitioning to the realm of oncology, a study by Chen et al. (2020) stands out for its

exploration of machine learning in predicting the progression of lung cancer. Employing a

random forest algorithm, the model assimilated radiological imaging data to forecast the

likelihood of tumor progression. The findings underscore the potential of machine learning not

only in disease prediction but also in tailoring treatment strategies based on individualized risk

assessments. While these case studies predominantly focus on non-cancerous diseases, their

methodologies and outcomes offer pertinent lessons for the domain of breast cancer prediction.

The ability of machine learning models to extract meaningful patterns from diverse datasets, as

demonstrated in these studies, forms a solid foundation for our endeavor to construct an accurate

and robust breast cancer predictive model.

In a more recent exploration by Wang et al. (2021), the researchers employed deep learning

techniques to predict the onset of neurodegenerative diseases. By integrating neural networks

with multi-modal data, including imaging and genetic information, the model exhibited

promising results in early detection. This underscores the evolving landscape of machine

learning applications in predicting diseases characterized by complex and multifactorial

etiologies.

As we navigate through these case studies, it becomes evident that the versatility of machine

learning transcends disease boundaries, offering a promising avenue for the development of our

predictive model for breast cancer. The amalgamation of diverse algorithms and data types in

these studies sets a precedent for our exploration into tailoring a comprehensive and accurate

predictive model specific to breast cancer.

13
2.4 Review of Previous Works on Machine Learning for General Diseases Prediction
Extensive work was carried out in the field of Artificial Intelligence, especially Machine

Learning, to detect common diseases. Dahiwade et al.2021 proposed a ML based system that

predicts common diseases. The symptoms dataset was imported from the KAGGLE ML

depository, where it contained symptoms of many common diseases. The system used CNN and

KNN as classification techniques to achieve multiple diseases prediction. Moreover, the

proposed solution was supplemented with more information that concerned the living habits of

the tested patient, which proved to be helpful in understanding the level of risk attached to the

predicted disease. Dahiwade et al. compared the results between KNN and CNN algorithm in

terms of processing time and accuracy. The accuracy and processing time of CNN were 84.5%

and 11.1 seconds, respectively.

In light of this study, the findings of Chen et al. 2019 also agreed that CNN outperformed typical

supervised algorithms such as KNN, NB, and DT. The authors concluded that the proposed

model scored higher in terms of accuracy, which is explained by the capability of the model to

detect complex nonlinear relationships in the feature space. Moreover, CNN detects features with

high importance that renders better description of the disease, which enables it to accurately

predict diseases with high complexity. This conclusion is well sup- ported and backed with

empirical observations and statistical arguments. Nonetheless, the presented models lacked

details, for instance, neural networks parameters such as network size, architecture type, learning

rate and back propagation algorithm, etc. In addition, the analysis of the performances is only

evaluated in terms of accuracy, which debunks the validity of the presented findings. Moreover,

the authors did not take into consideration the bias problem that is faced by the tested algorithms.

In illustration, the incorporation of more feature variables could immensely ameliorate the

performance metrics of under- performed algorithms. Uddin et al 2016 compared the various

14
supervised ML techniques. In their study, extensive research efforts were made to identify those

studies that applied more than one supervised machine learning algorithm on single disease

prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search

items. Thus, they selected 48 articles in total for the comparison among variants supervised

machine learning algorithms for disease prediction. They found that the Support Vector

Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Na¨ıve

Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior

accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy

in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was

considered.

Sengar et al. 2019 attempted to detect breast cancer using ML algorithms, namely RF, Bayesian

Networks and SVM. The researchers obtained the Wisconsin original breast cancer dataset from

the KAGGLE repository and utilized it for comparing the learning models in terms of key

parameters such as accuracy, recall, precision, and area of ROC graph. The classifiers were

tested using K-fold validation method, where the chosen value of K is equal to 10. The

simulation results have proved that SVM excelled in terms of recall, accuracy, and precision.

However, RF had a higher probability in the correct classification of the tumor,

Sengar et al. 2019 attempted to detect breast cancer using ML algorithms, namely RF, Bayesian

Networks and SVM. The researchers obtained the Wisconsin original breast cancer dataset from

the KAGGLE repository and utilized it for comparing the learning models in terms of key

parameters such as accuracy, recall, precision, and area of ROC graph. The classifiers were

tested using K-fold validation method, where the chosen value of K is equal to 10. The

simulation results have proved that SVM excelled in terms of recall, accuracy, and precision.

15
However, RF had a higher probability in the correct classification of the tumor, which was

implied by the ROC graph. In contrast, Yao experimented with various data mining methods

including RF and SVM to determine the best suited algorithm for breast cancer prediction. Per

results, the classification rate, sensitivity, and specificity of Random Forest algorithm were

96.27%, 96.78%, and 94.57%, respectively, while SVM scored an accuracy value of 95.85%, a

sensitivity of 95.95%, and a specificity of 95.53%. Yao came to the conclusion that the RF

algorithm performed better than SVM because the former provides better estimates of

information gained in each feature attribute. Furthermore, RF is the most adequate at breast

diseases classification, since it scales well for large datasets and prefaces lower chances of

variance and data over fitting. The studies advantageously presented multiple performance

metrics that solidified the underlined argument. Nevertheless, the inclusion of the preprocessing

stage to prepare raw data for training proved to be disadvantageous for ML models. According to

Yao, omitting parts of data reduces the quality of images, and therefore the performance of the

ML algorithm is hindered.

Noreen Fatima et al. 2018 performed a comparative review or machine learning techniques and

analyzed their accuracy across various journals. Her main focus is to comparatively analyze

different existing Machine Learning and Data Mining techniques in order to find out the most

appropriate method that will support the large dataset with good accuracy of prediction. She

found out that machine learning techniques were used in 27 papers, ensemble techniques were

used in 4 papers, and deep learning techniques were used in 8 papers. She concluded by saying

that each technique is suitable under different conditions and on different type of dataset, after

the comparative analysis of these algorithms we came to know that machine learning algorithm

SVM is the most suitable algorithm for prediction of breast cancer. Different researchers have

16
provided the analysis of prediction algorithms by using the dataset from Wisconsin Diagnostic

Breast Cancer (WDBC), and the analysis shows that each time the accuracy of SVM algorithm is

higher than the other machine learning algorithms.

Delen et al. 2020 used artificial neural networks, decision trees and logistic regression to develop

prediction models for breast cancer survival by analyzing a large dataset, the SEER cancer

incidence database. Two popular data mining algorithms (artificial neural networks and decision

trees) were used, along with a most commonly used statistical method (logistic regression) to

develop the prediction models using a large dataset (more than 200,000 cases). 10-fold cross-

validation method was used to measure the unbiased estimate of the three prediction models for

performance comparison purposes. The results indicated that the decision tree (C5) is the best

predictor with 93.6% accuracy on the holdout sample (this prediction accuracy is better than any

reported in the literature), artificial neural networks came out to be the second with 91.2%

accuracy and the logistic regression models came out to be the worst of the three with 89.2%

accuracy. The comparative study of multiple prediction models for breast cancer survivability

using a large dataset along with a 10-fold cross-validation provided us with an insight into the

relative prediction ability of different data mining methods. Using sensitivity analysis on neural

network models provided us with the prioritized importance of the prognostic factors used in the

study.

Lundin et al. 2015 used ANN and logistic regression models to predict 5, 10, and 15- year breast

cancer survival. They studied 951 breast cancer patients and used tumor size, axillary nodal

status, histological type, mitotic count, nuclear pleomorphism, tubule formation, tumor necrosis,

and age as input variables. In this study, they showed that data mining could be a valuable tool in

identifying similarities (patterns) in breast cancer cases, which can be used for diagnosis,

17
prognosis, and treatment purposes the area under the ROC curve (AUC) was used as a measure

of accuracy of the prediction models in generating survival estimates for the patients in the

independent validation set. The AUC values of the neural network models for 5-, 10- and 15-

year breastcancer-specific survival were 0.909, 0.886 and 0.883, respectively. The corresponding

AUC values for logistic regression were 0.897, 0.862 and 0.858. Axillary lymph node status (N0

vs. N+) predicted 5-year survival with a specificity of 71% and a sensitivity of 77%. The

sensitivity of the neural network model was 91% at this specificity level. The rate of false

predictions at 5 years was 82/300 for nodal status and 40/300 for the neural network. When

nodal status was excluded from the neural network model, the rate of false predictions increased

only to 49/300 (AUC 0.877). An artificial neural network is very accurate in the 5-, 10- and 15-

year breast cancer-specific survival prediction. The consistently high accuracy over time and the

good predictive performance of a network trained without information on nodal status

demonstrate that neural networks can be important tools for cancer survival prediction.

Yawen Xiao et al. says that breast cancer disease is common disease in female category of the

people. In this research work demonstrated a new system embedded with deep learning concept

based unsupervised feature extraction algorithm. The stacked auto- encoder concept was also

used with a support vector machine technique to predict breast cancer. The proposed method was

tested by using Wisconsin Diagnostic Breast Cancer data set. The result displays that SAE-SVM

method used to increase accuracy level to 98.25%

Junaid Ahmad Bhat et al.2021 developed a new tool used to detect the breast cancer disease in

early stage. In this research work the authors was presented preliminary results of the project

BCDM developed by using Matlab software. The algorithm was implemented using adaptive

resonance approach. In contrast, Yao experimented with various data mining

18
methods including RF and SVM to determine the best suited algorithm for breast cancer

prediction. Per results, the classification rate, sensitivity, and specificity of Random Forest

algorithm were 96.27%, 96.78%, and 94.57%, respectively, while SVM scored an accuracy

value of 95.85%, a sensitivity of 95.95%, and a specificity of 95.53%. Yao came to the

conclusion that the RF algorithm performed better than SVM because the former provides better

estimates of information gained in each feature attribute. Furthermore, RF is the most adequate at

breast diseases classification, since it scales well for large datasets and prefaces lower chances of