0% found this document useful (0 votes)
28 views30 pages

Machine Learning Based Decision Support System For The Diagnosis of Breast Cancer

Breast cancer is among the most prevalent diseases encountered among women worldwide. Early diagnosis of breast cancer is crucial for the treatment of the disease. Detecting the disease at an early stage prevents deaths resulting from the condition. Recently, computer-aided systems have been developed to ensure early-stage diagnosis and accuracy of breast cancer. Computer-aided systems developed with machine learning approaches significantly contribute to the process of diagnosing breast cancer.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views30 pages

Machine Learning Based Decision Support System For The Diagnosis of Breast Cancer

Breast cancer is among the most prevalent diseases encountered among women worldwide. Early diagnosis of breast cancer is crucial for the treatment of the disease. Detecting the disease at an early stage prevents deaths resulting from the condition. Recently, computer-aided systems have been developed to ensure early-stage diagnosis and accuracy of breast cancer. Computer-aided systems developed with machine learning approaches significantly contribute to the process of diagnosing breast cancer.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Machine Learning Based Decision Support


System for the Diagnosis of Breast Cancer
İlker Çakar1 Muhammed Kürşad UÇAR2
https://orcid.org/0000-0002-9659-4557 https://orcid.org/0000-0002-0636-8645
Department of Electrical and Electronics Engineering Department of Electrical and Electronics Engineering
Faculty of Engineering Faculty of Engineering
Sakarya University MKU Technology Sakarya University
Serdivan, Sakarya, Turkey Technology Development Zones
Serdivan, Sakarya, Turkey

Abstract:- Breast cancer is among the most prevalent I. INTRODUCTION


diseases encountered among women worldwide. Early
diagnosis of breast cancer is crucial for the treatment of Breast cancer is a disease where cells in the breast grow
the disease. Detecting the disease at an early stage uncontrollably [1]. Breast cancer ranks among the most
prevents deaths resulting from the condition. Recently, common cancers seen in women worldwide [2]. It is the most
computer-aided systems have been developed to ensure frequently diagnosed cancer among women in the United
early-stage diagnosis and accuracy of breast cancer. States [3]. Approximately 30% of newly diagnosed cancers in
Computer-aided systems developed with machine women each year are breast cancer [3]. It is important to
learning approaches significantly contribute to the understand that most breast lumps are benign and not
process of diagnosing breast cancer. The aim of this study cancerous (malignant) [4]. The histological grade of the
is to propose a new classification system based on tumor, a well-established prognostic factor, is crucial in
machine learning algorithms developed for the diagnosis guiding appropriate treatment in clinical practice [5].
of breast cancer. In this study, sub-data sets were created Additionally, detecting the disease at early stages can help
by reducing features, and data cleaning processes were prevent increased mortality [6]. If left unchecked, malignant
applied. After these procedures, stages such as feature tumors can spread throughout the body and be fatal [7].
selection and feature extraction were applied. In this However, there is no one-size-fits-all treatment approach for
study, classification processes such as Ensemble, k- breast cancer [8]. Factors such as the type and stage of breast
Nearest Neighbors (kNN), Support Vector Machines cancer and the individual's lifestyle are considered for
(SVMs), and Hybrid Artificial Intelligence were used in treatment options [8]. Generally, there are five treatment
line with machine learning. With the obtained results, a options, and most treatment plans involve a combination of
Breast Cancer diagnosis algorithm was created. the following: surgery, radiation, hormone therapy,
Performance evaluation criteria such as accuracy rate, chemotherapy, and targeted therapies [9]. Some are local and
specificity, sensitivity, kappa number and F-Measure target only the area around the tumor [9]. Others are systemic
were applied to the created algorithms. In the results and target the entire body with cancer-fighting agents [9].
obtained in this study, the highest accuracy rate was Despite all these treatment methods, if cancer has spread to
found to be 99.3% with the Ensemble method, the highest other parts of the body, it is usually incurable but can
specificity rate was 98.7% with the Ensemble method, normally be effectively controlled for a long period [10].
and the highest sensitivity rate was found to be 100%
with many methods. In light of these results, it was In recent years, there has been an increasing trend
observed that the machine learning algorithms used in towards the integration of computer-aided techniques in the
this study, implemented in the Matlab environment, were field of breast cancer to enhance the accuracy and efficacy of
effective. Consequently, it was proven that higher diagnosis and treatment [11]. Machine learning techniques
accuracy, specificity, and sensitivity rates can be found and medical imaging aid in this process [12]. Computer-aided
with different machine learning techniques. This also intelligent and automated diagnostic systems developed with
demonstrates that the study in our article is a reliable one machine learning approaches are significant tools in analysis
in detecting diseased and healthy individuals in the and can support medical professionals in the diagnosis of
diagnosis of breast cancer, showing that it is a more breast cancer, playing a role in the medical decision-making
applicable and feasible study in the healthcare field. process [13]. Recently, various techniques such as deep
learning, alongside machine learning techniques, have been
Keywords:- Breast Cancer Diagnosis; Machine Learning; utilized in the medical field [2], [14], [15], [16], [17].
Ensemble Method; Performance Review; Hybrit Artificial Additionally, data mining techniques have been considered a
Intelligence. straightforward method for understanding and predicting data
[18]. Microwave imaging is also among the prevalent imaging
techniques for early-stage screening and monitoring of breast
cancer [19]. Despite the presentation of numerous methods,

IJISRT24OCT1557 www.ijisrt.com 1472


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

most fail to provide accurate and consistent results [2]. II. MATERIAL AND METHODS
Moreover, existing systems require higher accuracy rates and
less computation time [20]. However, all these existing The workflow applied in the study is summarized in
studies have not yet achieved a consistent accuracy rate. Figure 1. Initially, feature selection was performed on the
breast cancer data set obtained from individuals. According to
Machine learning used in the diagnosis of breast cancer the Eta feature selection algorithm, the 14 features in the
is defined as the process of using data to discover hidden breast cancer data set were ranked starting from the best
information that is not easily identifiable [21]. The primary feature. Based on this ranking, 14 different sub-data sets were
goal of machine learning is to enable a system to learn created. Subsequently, the sub-data sets were balanced. As a
without human intervention, which helps in designing an result of the data balancing process, 84 additional sub-data
automatic system for decision-making [22]. In the literature, sets were created. Finally, classification processes such as
the use of Machine Learning (ML) has also been suggested in Ensemble, kNN, SVMs, and Hybrid Artificial Intelligence
previous studies [23]. However, improving the prediction were applied to the balanced sub-data sets. The diagnosis of
accuracy of the machine learning model has been seen as a breast cancer was attempted based on the compared
significant challenge and research gap [24]. Despite all these classification processes.
challenges, researchers have presented numerous machine
learning techniques in previous articles to address the
classification difficulty of breast cancer [25].

In this study, classification processes such as Ensemble,


kNN, SVMs, and Hybrid Artificial Intelligence were applied
in line with machine learning. With the obtained results, a
Breast Cancer diagnosis algorithm was created. Studies in the
literature have shown that different machine learning and
feature selection algorithms have been used on data sets with
varying characteristics for breast cancer diagnosis. Various
performance metrics such as accuracy, the area under the
ROC curve, recall, sensitivity, specificity, and kappa statistics
have been used in the literature to evaluate the performance of
machine learning models. However, it has been observed that
most studies do not exceed a performance criteria ratio of
99.68% in the machine learning model [26]. In this study,
performance evaluation criteria similar to those in the
literature, such as accuracy rate, specificity, sensitivity, kappa
number, and F-Measure, were applied [27], [28]. The
performance evaluation criteria specified in this study showed
similarities to some studies in the literature. In studies using
the same data set and similar machine learning algorithms, the
highest accuracy rate was found to be 82.70% with the
Random Forest method, the highest specificity rate was 84%
with the SVMs method, and the highest sensitivity rate was
84% with the Extreme Boost method [29]. In the results
obtained in this study, the highest accuracy rate was found to
be 99.3% with the Ensemble method, the highest specificity
rate was 98.7% with the Ensemble method, and the highest Fig 1 The Workflow in the Study
sensitivity rate was 100% with many methods. In some
studies, as in this study, common stages such as creating sub- A. Data Set
data sets by reducing features, data cleaning, feature selection, The dataset utilized in this study was obtained from the
and feature extraction were applied, but high accuracy rates publicly available website "www.kaggle.com" [36].
were not achieved [30], [31], [32]. In some studies, machine
learning algorithms have been developed on different The dataset used in this study includes information from
platforms such as R programming, Weka, Spark, and Python 4024 different individuals, encompassing age, race, marital
[33], [34], [35]. It has been observed that the machine status, T stage, N stage, stage 6, A status, tumor size, estrogen
learning algorithms applied on these platforms were less status, progesterone status, examined regional nodes,
effective compared to those in this study. examined positive nodes, and months of survival. Based on
this information, the survival and death outcomes of 4024
individuals were classified.

IJISRT24OCT1557 www.ijisrt.com 1473


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Table 1 Mathematical Representation of Features and Codes [37]


No Feature Equation
1 Kurtosis

2 Skewness

3 * IQR
4 CV
5 Geometric Mean
6 Harmonic Mean
7 Activity - Hjort Parameters
8 Mobility - Hjort Parameters
9 Complexity - Hjort Parameters
10 * Maximum
11 Median

12 * Mean Absolute Deviation


13 * Minimum
14 * Central Moments
15 Mean
16 Average Curve Length
17 Average Energy
18 Root Mean Squared

19 Standard Error
20 Standard Deviation

21 Shape Factor
22 * Singular Value Decomposition
23 * 25% Trimmed Mean
24 * 50% Trimmed Mean
25 Average Teager Energy

* The property was computed using MATLAB


IQR Interquartile Range, CV Coefficient of Variation
: variance of the signal
: Variance of the 1st derivative of the signal
: Variance of the 2nd derivative of the signal

B. Data Preprocessing Stage, N Stage, 6th Stage, Grade, A Stage, Tumor Size,
The data has been prepared for analysis. The data Estrogen Status, Progesterone Status, Regional Node
preparation process, known in the literature as data Examined, Regional Node Positive, and Survival Months.
preprocessing, has been elaborated in detail under the Among these, Age, Tumor Size, Regional Node Examined,
headings formulated by Han and Kamber (2006) [38]. The Regional Node Positive, and Survival Months have numerical
data preprocessing steps used in the study are outlined values and are not assigned any categorical values in the
sequentially below. dataset. However, the other features, being non-numerical, are
each assigned a numerical value. These assignments are
 Data Grouping illustrated in Table 2 and Table 3. This procedure is
The raw dataset comprises 14 features associated with implemented to ensure the dataset's effective performance
4024 individuals, represented with specific mathematical with artificial intelligence algorithms.
values. These 14 features are Age, Race, Marital Status, T

IJISRT24OCT1557 www.ijisrt.com 1474


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Table 2 Distribution of Features


Description of Nominal Attributes
Attribute Description Statement
1=White(%85), 2=Black(%7), 3= Other (American
Race Race
Indian/AK Native, Asian/Pacific Islander)(%8)
1=Single (never married)(%15), 2=Divorced(%12),
Martial Status 3=Separated(%1), 4=Married (including common Martial Status
law)(%66), 5=Widowed(%6)
Using the TNM system, the “T” plus a letter or
number (0 to 4) is used to describe the size and
T Stage 1=T1(%40), 2=T2(%44), 3=T3(%13), 4=T4(%3) location of the tumor. Tumor size is measured in
centimeters (cm). A centimeter is roughly equal to
the width of a standard pen or pencil.
The “N” in the TNM staging system stands for
lymph nodes. These small, bean-shaped organs help
N Stage 1=N1(%68), 2=N2(%20), 3=N3(%12)
fight infection. Lymph nodes near where the cancer
started are called regional lymph nodes.
If you have surgery as the first treatment for your
cancer, your doctor will generally confirm the stage
of the cancer when the testing after surgery is
finalized, usually about 5 to 7 days after surgery.
When systemic treatment is given before surgery,
1=IIA(%32), 2=IIB(%28), 3=IIIA(%26), which is typically with medications and is called
6th Stage
4=IIIB(%2), 5=IIIC(%12) neoadjuvant therapy, the stage of the cancer is
primarily determined clinically. Doctors may refer
to stage I to stage IIA cancer as "early stage" and
stage IIB to stage III as "locally advanced."
1=Well differentiated; Grade I(%13), 2=Moderately
The grade describes how a cancer cell looks under
differentiated; Grade II(%58), 3=Poorly
Grade the microscope and whether they are similar or very
differentiated; Grade III(%28), 4=Undifferentiated;
different to normal cells.
anaplastic; Grade IV(%1)
The SEER database tracks 5-year relative survival
rates for breast cancer in the United States, based on
how far the cancer has spread. The SEER database,
however, does not group cancers by AJCC TNM
A Stage 1=Religional(%98), 2=Distant(%2)
stages (stage 1, stage 2, stage 3, etc.). Instead, it
groups cancers into regional and distant stages
Estrogen Status 1=Positive(%93), 2=Negative(%7) Estrogen Status
Progesterone Status 1=Positive(%83), 2=Negative(%17) Progesterone Status
Class 1=Alive(%85), 2=Dead(%15) Class

Table 3 Description of Numeric Attributes


Description of Numeric Attributes
Description Statement
Attribute
Min 25% 50% 75% Max Mean
Age 39 47 54 61 69 54 Age
Tumor Size 1 16 25 38 140 30.5 Tumor Size
Regional Node Examined 1 9 14 19 61 14.4 Regional Node Examined
Regional Node Positive 1 1 2 5 46 4.16 Regional Node Positive
Survival Months 1 56 73 90 107 71.3 Survival Months

 Data Balancing individuals in the dataset was sequentially divided into six
The classifications resulting from the 14 features segments. Each segment was combined with the Dead
associated with 4024 individuals in the dataset are defined individuals. As a result, six different subsets of the dataset
under the "Class" category as either "Alive" or "Dead." were created, each containing 568 Alive individuals and 616
However, due to the significant difference in the number of Dead individuals, resulting in a total of 1184 individuals per
Alive and Dead individuals, the dataset underwent a data subset. The distribution of this dataset is shown in Table 4.
balancing process to obtain more accurate results from AI-
based algorithms. Consequently, the number of Alive

IJISRT24OCT1557 www.ijisrt.com 1475


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Table 4 Distribution of Training – Test Data Set


Class
Dataset Total
Alive Dead
A Dataset 568 616 1184
B Dataset 568 616 1184
C Dataset 568 616 1184
D Dataset 568 616 1184
E Dataset 568 616 1184
F Dataset 568 616 1184

C. Feature Extraction
Feature extraction was performed by calculating the dataset's features according to the specific formulas applied to data from
4024 different individuals. The formula calculations used for feature extraction are presented in Table. Generally, class labels are
unordered categorical variables.

Table 5 Features of Data Set


Number Attribute Unit Data Type
1 Age 30 – 69 Numeric
2 Race 1,2,3 Nominal
3 Martial Status 1,2,3,4,5 Nominal
4 T Stage 1,2,3,4 Nominal
5 N Stage 1,2,3 Nominal
6 6th Stage 1,2,3,4,5 Nominal
7 Grade 1,2,3,4 Nominal
8 A Stage 1,2 Nominal
9 Tumor Size 1 – 140 Numeric
10 Estrogen Status 1,2 Nominal
11 Progesterone Status 1,2 Nominal
12 Regional Node Examined 1 – 61 Numeric
13 Regional Node Positive 1 – 46 Numeric
14 Survival Months 1 – 107 Numeric
15 Class 1,2 Nominal

D. Feature Selection Algorithm 20%. This process involves first calculating an Eta value for
Various correlation calculation methods exist in the each feature. Then, an average Eta value is determined.
literature, each requiring an appropriate correlation formula According to this average, the first method selects features
based on the specific data group [39]. above the threshold value, while the second method selects
the features within the top 20% based on their Eta values.
In this study, the Eta feature selection algorithm was
employed. Eta is a method used between class labels and The features selected through the Eta feature selection
numerical variables. In this method, the F-Score is calculated algorithm, along with their percentage representation in the
as the threshold value. Subsequently, features are selected performance evaluation ranking, are shown in Table 6. The
using two different methods: The first method selects calculated correlation values of the features selected
features that exceed the threshold value. The second method according to Eta are presented in Table 7.
ranks the features from highest to lowest and selects the top

Table 6 The Number of Features Selected According to the Eta Criterion


Percentage (%) Feature Number
7 14
14 14,6
21 14,6,13
29 14,6,13,5
36 14,6,13,5,10
43 14,6,13,5,10,11
50 14,6,13,5,10,11,7
57 14,6,13,5,10,11,7,4
64 14,6,13,5,10,11,7,4,9
71 14,6,13,5,10,11,7,4,9,8
79 14,6,13,5,10,11,7,4,9,8,1

IJISRT24OCT1557 www.ijisrt.com 1476


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

86 14,6,13,5,10,11,7,4,9,8,1,12
93 14,6,13,5,10,11,7,4,9,8,1,12,3
100 14,6,13,5,10,11,7,4,9,8,1,12,3,2

Table 7 The Correlation values of the Features Selected based on the Eta Criterion
Eta Score Value of the Feature
14 0.4765
6 0.2576
13 0.2566
5 0.2558
10 0.1847
11 0.1771
7 0.1614
4 0.1547
9 0.1342
8 0.0966
1 0.0559
12 0.0348
3 0.0315
2 0.0042

E. Machine Learning Algorithms  Ensemble Decision Trees


In this study, Ensemble, k-Nearest Neighbors (kNN), Ensemble decision trees are algorithm-based systems
Support Vector Machines (SVMs), and Hybrid Artificial that are essential for data interpretation, particularly for large
Intelligence machine learning classification algorithms were datasets. These trees generate algorithms that provide
employed. The classification process was conducted systematic guidance. Unlike standard decision trees,
separately on six datasets, as indicated in Table 4, with half ensemble decision trees compare all decision trees within the
of the data used for model training and the other half for system and integrate them into a single, unified decision tree.
model testing. A training dataset was created for each data
group, and the remaining data was utilized for testing. The For instance: In the context of deciding whether to rent
model created from the test data was evaluated using or buy a house, variables such as price, number of rooms, and
performance evaluation criteria to assess its effectiveness. square footage are considered. The output variable is binary,
indicating either yes (rent or buy) or no. When determining
criteria such as a price below X and square footage of at least
Y, a decision tree model is effectively constructed [40]. Fig 2
illustrates the decision tree model.

Fig 2 Ensemble Decision Trees [41]

 k Nearest Neighbor (kNN) illustrated in Fig 3, the data point whose value is to be
In the k-Nearest Neighbor Algorithm, the classification determined is classified into the category that encompasses
of a data point is determined by examining the closest data the majority of its neighbors within a specified radius. Since
points around it, based on the distribution of the data. The "k" is typically an odd number, the possibility of a tie is
term "k" refers to the number of nearest data points to be avoided. Euclidean and Manhattan distance functions are
considered when determining the value of the target data commonly used to calculate the distances between data
point. Generally, "k" is chosen as an odd number to ensure a points.
more robust decision-making process in the system [42]. As

IJISRT24OCT1557 www.ijisrt.com 1477


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Fig 3 k-Nearest Neighbors (kNN) [43]

 Support Vector Machines (SVMs) accordingly. For example, if the curve is a circle, the data
Similar to kNN, in Support Vector Machines (SVMs), points can be grouped as those inside or outside the circle
all data points are plotted on an x-y axis based on their [44].
features. A curve is then fitted to separate the two features.
This curve is generated by optimizing the features within the Fig 4 shows an example of Support Vector Machines
dataset and can be either linear or non-linear. The data points with both linear and non-linear features created using two
that fall above and below the curve are then classified features on a two-axis system.

Fig 4 Support Vector Machines (SVMs) [45]

 Hybrid Artificial Intelligence The hybrid method is quite similar to the Ensemble
This method is based on the arithmetic average of the Decision Trees method. Another name for the hybrid method
classification decisions generated by applying kNN, Decision could be "Ensemble." Both methods aim to combine multiple
Trees, and Support Vector Machines (SVMs) algorithms to artificial intelligence algorithms—typically 99 in number. In
the data. Essentially, the outcome of Hybrid Artificial the Ensemble method, only decision trees are combined,
Intelligence corresponds to the majority decision among whereas the goal of the hybrid method is to integrate
these methods. The weaknesses of the individual different algorithms. The number of algorithms used in the
classification algorithms do not affect the final decision in hybrid method is typically odd (e.g., 1, 3, 5, 7, ... n). The
Hybrid Artificial Intelligence. The goal is to combine weak primary distinction between Hybrid AI and Ensemble
classifiers to form a strong classifier under the name of Decision Trees is that the former does not involve decision
Hybrid Artificial Intelligence. The performance improves as trees. Instead, Hybrid AI makes its decision by averaging the
the number of classification algorithms used in the Hybrid AI outcomes of different algorithms, without deriving a decision
increases. tree.

IJISRT24OCT1557 www.ijisrt.com 1478


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Fig 5 Hybrid Artificial Intelligence [46]

Consider an example where n data points, such as data Hybrid method calculates the average of these three methods,
points 1, 2, and 3, are input into the system. In the Hybrid resulting in a value of 1.33333, as shown in Table 8. This
method, it is crucial that the number of data points, n, is odd, value is then rounded, and the final classification for data
as an even number of data points can lead to an indeterminate point 1, based on the majority rule, is determined as healthy.
result, making the Hybrid method ineffective.
In the Hybrid method, whether the average of the
For instance, for data point 1, the kNN method might method labels or the majority rule is applied, the outcome is
classify the individual as sick, assigning label 1; the consistent.
Conditional Decision Tree (Ctree) method might classify the
individual as healthy, assigning label 2; and the SVM method The underlying principle of the Hybrid method is to
might classify the individual as sick, assigning label 1. The adopt the decision that reflects the majority.

Table 8 Example Table of Hybrid Artificial Intelligence


Veri kNN Ctree SVMs Hibrit Gerçek
1 1 2 1 1.333333 1
2 2 2 1 1.666667 2
3 2 1 1 1.333333 1

As shown in Table 8, each method used in the study according to the actual values. Thus, the Hybrid method aims
demonstrates weaknesses when analyzing the labels to aggregate weak classifiers into a robust classifier.
produced. For instance, Ctree shows a weakness with label 1,
SVMs with label 2, and kNN with label 1.  This Concept can be Explained with the Following
Example:
The primary objective of the Hybrid method is to Consider a table with four legs. If one person, lacking
combine weak classifiers (Ctree, SVMs, and kNN) to form a sufficient strength, attempts to lift the table, they are unable
stronger classifier. When each method is applied individually to do so. However, if four individuals, each with similar
to data points 1, 2, and 3, each method correctly classifies strength, lift the table together, they succeed. While the table
two out of the three data points. However, when the Hybrid may be unstable when lifted by one person, the likelihood of
method is applied, it correctly classifies all three data points it falling decreases when four people lift it together. The

IJISRT24OCT1557 www.ijisrt.com 1479


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

principle of the Hybrid method is analogous: as the number measure [27], [28]. These criteria were applied to the
of classifiers increases, the performance of the Hybrid Ensemble, kNN, SVMs, and Hybrid Artificial Intelligence
method correspondingly improves. classifiers.

F. Performance Evaluation Criteria The training-test split ratio for dataset classification was
The study utilized performance evaluation criteria such set at 50%-50%, as shown in Table 9.
as accuracy, specificity, sensitivity, kappa statistic, and F-

Table 9 Training-Test Dataset Distribution (Sample Distribution Table)


A Dataset
Class
Training(%50) Test(%50) Total
Alive (Group 1 ) 284 284 568
Dead 308 308 616
Total 592 592 1184
B Dataset
Class
Training(%50) Test(%50) Total
Alive (Group 2 ) 284 284 568
Dead 308 308 616
Total 592 592 1184
C Dataset
Class
Training(%50) Test(%50) Total
Alive (Group 3 ) 284 284 568
Dead 308 308 616
Total 592 592 1184
D Dataset
Class
Training(%50) Test(%50) Total
Alive (Group 4 ) 284 284 568
Dead 308 308 616
Total 592 592 1184
E Dataset
Class
Training(%50) Test(%50) Total
Alive (Group 5 ) 284 284 568
Dead 308 308 616
Total 592 592 1184
F Dataset
Class
Training(%50) Test(%50) Total
Alive (Group 6 ) 284 284 568
Dead 308 308 616
Total 592 592 1184

III. RESULTS According to Table 10 and Table 11, the performance


evaluation criteria (accuracy, sensitivity, specificity, F1-
The objective of this study is to develop a rule-based score, kappa, and AUC) for the A Dataset in Table 9—across
diagnostic algorithm for breast cancer using artificial all classification models (SVMs, kNN, Ensemble, and
intelligence methods. The dataset used in the study includes Hybrid)—initially decreased when ranked from the best to
diagnostic outcomes for individuals based on 14 clinical the top 7 features, and then increased as the number of
findings. The Eta feature selection algorithm was applied to features expanded to 14. Overall, when examining the
the dataset, ranking the 14 features from the most significant ranking of the 14 features, the Ensemble method was
to the least. Based on this ranking, 14 different subsets of the identified as the most successful classification approach. The
dataset were created. Due to the imbalance in the dataset highest performance evaluation metrics within the Ensemble
concerning the class variable, these subsets underwent data classification method were observed in the dataset containing
balancing procedures, resulting in 84 additional subsets. the top 13 features. In Fig 6, it is visually evident that the
These 84 subsets were generated by evaluating six different Ensemble classification method occupies the largest area
datasets derived from the original class dataset, with each when compared to other methods, especially when
dataset being analyzed separately based on the 14 ranked considering the dataset with the top 13 features.
features. All balanced datasets were classified using
Ensemble, kNN, SVMs, and Hybrid Artificial Intelligence According to Table 12 and Table 13, the performance
methods. Based on the results, a diagnostic algorithm for evaluation criteria (accuracy, sensitivity, specificity, F1-
breast cancer was developed. score, kappa, and AUC) for the B Dataset in Table 9—across

IJISRT24OCT1557 www.ijisrt.com 1480


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

all classification models (SVMs, kNN, Ensemble, and when considering the dataset with the top 1 feature,
Hybrid)—initially decreased when ranked from the best to compared to other methods.
the top 13 features, and then increased at the 14th feature.
Overall, when examining the ranking of the 14 features, the According to Table 20 and Table 21, the performance
Ensemble method was identified as the most successful evaluation criteria (accuracy, sensitivity, specificity, F1-
classification approach. However, alongside the Ensemble score, kappa, and AUC) for the F Dataset in Table 9—across
method, the kNN and Hybrid classification methods also all classification models (SVMs, kNN, Ensemble, and
demonstrated equally high performance metrics within the Hybrid)—initially decreased when ranked from the best to
dataset containing the top 1 feature. In Fig 7, it is visually the top 7 features, and then increased as the number of
evident that the Ensemble, kNN, and Hybrid classification features expanded to 14. Overall, when examining the
methods occupy the largest area when considering the dataset ranking of the 14 features, the Ensemble method was
with the top 1 feature, compared to other methods. identified as the most successful classification approach.
Within the Ensemble classification method, the highest
According to Table 14 and Table 15, the performance performance metrics were observed in the dataset containing
evaluation criteria (accuracy, sensitivity, specificity, F1- the top 2 features. In Fig 11, it is visually evident that the
score, kappa, and AUC) for the C Dataset in Table 9—across Ensemble classification method occupies the largest area
all classification models (SVMs, kNN, Ensemble, and when considering the dataset with the top 2 features,
Hybrid)—initially decreased when ranked from the best to compared to other methods.
the top 7 features, and then increased as the number of
features expanded to 14. Overall, when examining the According to Table 9, when analyzing the methods
ranking of the 14 features, the Ensemble method was applied across all datasets, it is observed that the sensitivity
identified as the most successful classification approach. value is higher than the accuracy value, indicating a higher
However, alongside the Ensemble method, the kNN and rate of disease detection. This suggests that the system is
Hybrid classification methods also demonstrated equally high effective in identifying individuals with breast cancer. The
performance metrics within the dataset containing the top 1 system is particularly effective in the early detection of breast
feature. In Fig 8, it is visually evident that the Ensemble, cancer in affected individuals.As shown in Table 18, Table
kNN, and Hybrid classification methods occupy the largest 19, and Fig 10, the Ensemble method applied to the E
area when considering the dataset with the top 1 feature, Dataset, which utilizes the top 1 feature, along with the
compared to other methods. SVMs classification method applied to the datasets with the
top 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, and 13 features, all achieve a
According to Table 16 and Table 17, the performance sensitivity rate of 100%, indicating the highest and most
evaluation criteria (accuracy, sensitivity, specificity, F1- consistent sensitivity performance. However, when
score, kappa, and AUC) for the D Dataset in Table 9—across considering the overall system performance across all
all classification models (SVMs, kNN, Ensemble, and datasets, the specificity value is observed to be lower than the
Hybrid)—initially decreased when ranked from the best to sensitivity value. This implies that the system is less effective
the top 7 features, and then increased as the number of in identifying healthy individuals compared to its ability to
features expanded to 14. Overall, when examining the detect cancerous cases. Consequently, the likelihood of a
ranking of the 14 features, the Hybrid method was identified breast cancer patient being misclassified as healthy is low,
as the most successful classification approach. However, and the system prioritizes the accurate detection of cancerous
alongside the Hybrid method, the kNN and Ensemble patients.
classification methods also demonstrated equally high
performance metrics within the dataset containing the top 1 In the overall application, the accuracy rate of the
feature. In Fig 9, it is visually evident that the Ensemble, system is generally above 90% across all datasets, as shown
kNN, and Hybrid classification methods occupy the largest in Table 9 Training-Test Dataset Distribution (Sample
area when considering the dataset with the top 1 feature, Distribution Table). This high accuracy rate suggests that the
compared to other methods. system is highly suitable for use in the healthcare field.
Specifically, the F Dataset, as shown in Table 20, Table 21,
According to Table 18 and Table 19, the performance and Fig 11, predominantly achieves the highest accuracy
evaluation criteria (accuracy, sensitivity, specificity, F1- rates within the Ensemble classification method. The dataset
score, kappa, and AUC) for the E Dataset in Table 9—across using the Ensemble classification method with the top 2
all classification models (SVMs, kNN, Ensemble, and features in the F Dataset achieves the highest accuracy rate of
Hybrid)—initially decreased when ranked from the best to 99.3%. In the study, feature extraction was performed using
the top 7 features, and then increased as the number of the best single feature, as demonstrated in Table 12, Table
features expanded to 14. Overall, when examining the 13, and Fig 7; Table 14, Table 15, and Fig 8; Table 16, Table
ranking of the 14 features, the SVMs method was identified 17, and Fig 9; and Table 18, Table 19, and Fig 10.
as the most successful classification approach. However, Additionally, feature extraction using the best two features
within the Ensemble classification method, the highest was shown in Table 20, Table 21, and Fig 11. These results
performance metrics were observed in the dataset containing demonstrate and visualize that the system can achieve
the top 1 feature. In Fig 10, it is visually evident that the efficient outcomes with reduced computational workload by
Ensemble classification method occupies the largest area extracting fewer features.

IJISRT24OCT1557 www.ijisrt.com 1481


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Table 10 A Dataset Summary Table – Best Methods


L FN FP Model Accuracy Sensitivity Specificity F1 Score Kappa AUC
SVMs 100.0 100.0 NaN NaN 0.0 50.0
kNN 72.5 97.5 49.4 65.5 45.9 73.4
1 1 7.14 Ensemble 82.1 79.6 84.4 81.9 64.1 82.0
Hybrid 82.1 79.6 84.4 81.9 64.1 82.0
SVMs 63.7 67.3 60.4 63.6 27.5 63.8
kNN 76.2 91.9 61.7 73.8 52.9 76.8
2 2 14.28 Ensemble 80.4 82.0 78.9 80.4 60.8 80.5
Hybrid 79.4 87.7 71.8 78.9 59.0 79.7
SVMs 59.3 87.7 33.1 48.1 20.3 60.4
kNN 78.4 85.6 71.8 78.1 57.0 78.7
3 3 21.42 Ensemble 79.2 78.9 79.5 79.2 58.4 79.2
Hybrid 78.4 84.2 73.1 78.2 56.9 78.6
SVMs 63.3 74.3 53.2 62.0 27.3 63.8
kNN 78.0 84.9 71.8 77.8 56.3 78.3
4 4 28.56 Ensemble 78.4 79.9 76.9 78.4 56.8 78.4
Hybrid 78.2 82.7 74.0 78.1 56.5 78.4
SVMs 61.1 86.3 38.0 52.7 23.8 62.1
kNN 77.9 85.2 71.1 77.5 56.0 78.2
5 5 35.7 Ensemble 79.4 81.0 77.9 79.4 58.8 79.5
Hybrid 78.7 84.5 73.4 78.5 57.6 78.9
SVMs 64.9 81.0 50.0 61.8 30.6 65.5
kNN 78.2 86.6 70.5 77.7 56.7 78.5
6 6 42.84 Ensemble 79.2 79.6 78.9 79.2 58.4 79.2
Hybrid 79.2 83.8 75.0 79.2 58.6 79.4
SVMs 66.9 81.0 53.9 64.7 34.5 67.4
kNN 77.7 85.6 70.5 77.3 55.6 78.0
7 7 49.98 Ensemble 77.5 76.8 78.2 77.5 55.0 77.5
Hybrid 78.5 81.7 75.6 78.6 57.2 78.7

Table 11 A Dataset Summary Table – Best Methods(continue)


L FN FP Model Accuracy Sensitivity Specificity F1 Score Kappa AUC
SVMs 66.7 78.2 56.2 65.4 34.0 67.2
kNN 77.0 83.1 71.4 76.8 54.2 77.3
8 8 57.12 Ensemble 78.5 78.5 78.6 78.5 57.1 78.5
Hybrid 78.9 81.0 76.9 78.9 57.8 79.0
SVMs 66.4 79.2 54.5 64.6 33.4 66.9
kNN 76.9 81.0 73.1 76.8 53.8 77.0
9 9 64.26 Ensemble 81.1 82.7 79.5 81.1 62.2 81.1
Hybrid 80.6 84.2 77.3 80.6 61.2 80.7
SVMs 66.2 78.9 54.5 64.5 33.1 66.7
kNN 76.9 81.0 73.1 76.8 53.8 77.0
10 10 71.4 Ensemble 81.1 82.4 79.9 81.1 62.2 81.1
Hybrid 80.9 83.8 78.2 80.9 61.9 81.0
SVMs 66.2 77.8 55.5 64.8 33.0 66.7
11 kNN 74.8 77.8 72.1 74.8 49.7 74.9
11 78.54 Ensemble 81.4 82.4 80.5 81.4 62.8 81.5
Hybrid 79.7 83.1 76.6 79.7 59.5 79.9
SVMs 67.4 78.2 57.5 66.2 35.3 67.8
kNN 71.6 74.6 68.8 71.6 43.3 71.7
12 12 85.68 Ensemble 82.3 85.2 79.5 82.3 64.6 82.4
Hybrid 79.1 83.8 74.7 79.0 58.2 79.2
SVMs 68.6 79.2 58.8 67.5 37.6 69.0
kNN 72.3 76.1 68.8 72.3 44.7 72.4
13 13 92.82 Ensemble 83.1 86.3 80.2 83.1 66.3 83.2
Hybrid 80.7 85.2 76.6 80.7 61.6 80.9
SVMs 68.2 77.8 59.4 67.4 36.9 68.6
kNN 71.8 75.4 68.5 71.8 43.7 71.9
14 14 99.96 Ensemble 83.1 85.2 81.2 83.1 66.2 83.2
Hybrid 80.1 82.7 77.6 80.1 60.2 80.2

IJISRT24OCT1557 www.ijisrt.com 1482


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Table 12 B Dataset Summary Table – Best Methods


L FN FP Model Accuracy Sensitivity Specificity F1 Score Kappa AUC
SVMs 78.2 100.0 58.1 73.5 57.1 79.1
kNN 93.4 100.0 87.3 93.2 86.9 93.7
1 1 7.14 Ensemble 93.4 100.0 87.3 93.2 86.9 93.7
Hybrid 93.4 100.0 87.3 93.2 86.9 93.7
SVMs 78.0 93.7 63.6 75.8 56.6 78.6
kNN 93.6 100.0 87.7 93.4 87.2 93.8
2 2 14.28 Ensemble 93.4 97.2 89.9 93.4 86.8 93.6
Hybrid 93.9 99.3 89.0 93.8 87.9 94.1
SVMs 77.4 92.6 63.3 75.2 55.2 78.0
kNN 92.4 95.4 89.6 92.4 84.8 92.5
3 3 21.42 Ensemble 91.6 92.3 90.9 91.6 83.1 91.6
Hybrid 92.2 94.7 89.9 92.3 84.5 92.3
SVMs 76.5 93.0 61.4 73.9 53.6 77.2
4 kNN 92.4 95.4 89.6 92.4 84.8 92.5
4 28.56 Ensemble 91.7 93.3 90.3 91.8 83.4 91.8
Hybrid 92.6 95.8 89.6 92.6 85.2 92.7
SVMs 76.4 93.0 61.0 73.7 53.3 77.0
kNN 92.1 95.4 89.0 92.1 84.1 92.2
5 5 35.7 Ensemble 90.7 91.5 89.9 90.7 81.4 90.7
Hybrid 91.6 94.4 89.0 91.6 83.1 91.7
SVMs 76.4 92.3 61.7 73.9 53.2 77.0
kNN 91.4 94.4 88.6 91.4 82.8 91.5
6 6 42.84 Ensemble 90.9 91.9 89.9 90.9 81.7 90.9
Hybrid 91.0 93.7 88.6 91.1 82.1 91.1
SVMs 76.2 89.8 63.6 74.5 52.8 76.7
kNN 91.0 93.3 89.0 91.1 82.1 91.1
7 7 49.98 Ensemble 88.9 88.0 89.6 88.8 77.7 88.8
Hybrid 90.2 91.9 88.6 90.2 80.4 90.3

Table 13 B Dataset Summary Table – Best Methods (Continue)


L FN FP Model Accuracy Sensitivity Specificity F1 Score Kappa AUC
SVMs 76.2 89.8 63.6 74.5 52.8 76.7
kNN 91.2 93.7 89.0 91.3 82.4 91.3
8 8 57.12 Ensemble 89.4 89.4 89.3 89.4 78.7 89.4
Hybrid 90.4 92.3 88.6 90.4 80.7 90.4
SVMs 76.0 90.5 62.7 74.0 52.5 76.6
kNN 87.8 90.5 85.4 87.9 75.7 87.9
9 9 64.26 Ensemble 91.9 94.4 89.6 91.9 83.8 92.0
Hybrid 91.2 96.1 86.7 91.2 82.5 91.4
SVMs 76.0 90.5 62.7 74.0 52.5 76.6
kNN 87.8 90.5 85.4 87.9 75.7 87.9
10 10 71.4 Ensemble 91.4 93.7 89.3 91.4 82.8 91.5
Hybrid 90.7 95.8 86.0 90.6 81.5 90.9
SVMs 76.4 88.7 64.9 75.0 53.1 76.8
kNN 86.7 89.8 83.8 86.7 73.3 86.8
11 11 78.54 Ensemble 91.2 93.7 89.0 91.3 82.4 91.3
Hybrid 88.9 93.0 85.1 88.8 77.7 89.0
SVMs 75.8 89.1 63.6 74.2 52.1 76.4
kNN 83.8 87.0 80.8 83.8 67.6 83.9
12 12 85.68 Ensemble 91.7 94.4 89.3 91.8 83.5 91.8
Hybrid 88.7 94.4 83.4 88.6 77.4 88.9
SVMs 76.2 89.1 64.3 74.7 52.8 76.7
13 92.82 kNN 83.8 86.6 81.2 83.8 67.6 83.9
13
Ensemble 91.4 94.4 88.6 91.4 82.8 91.5
Hybrid 88.2 92.6 84.1 88.1 76.4 88.3
SVMs 75.3 88.7 63.0 73.7 51.1 75.9
kNN 83.4 85.9 81.2 83.5 66.9 83.5
14 14 99.96 Ensemble 92.1 94.7 89.6 92.1 84.1 92.2
Hybrid 88.2 93.0 83.8 88.1 76.4 88.4

IJISRT24OCT1557 www.ijisrt.com 1483


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Table 14 C Dataset Summary Table – Best Methods


L FN FP Model Accuracy Sensitivity Specificity F1 Score Kappa AUC
SVMs 86.7 100.0 74.4 85.3 73.6 87.2
kNN 96.1 100.0 92.5 96.1 92.2 96.3
1 1 7.14 Ensemble 96.1 100.0 92.5 96.1 92.2 96.3
Hybrid 96.1 100.0 92.5 96.1 92.2 96.3
SVMs 86.5 100.0 74.0 85.1 73.2 87.0
kNN 95.6 99.6 91.9 95.6 91.2 95.8
2 2 14.28 Ensemble 95.9 99.3 92.9 96.0 91.9 96.1
Hybrid 95.6 99.6 91.9 95.6 91.2 95.8
SVMs 85.6 98.6 73.7 84.3 71.5 86.1
kNN 94.8 97.5 92.2 94.8 89.5 94.9
3 3 21.42 Ensemble 94.8 96.1 93.5 94.8 89.5 94.8
Hybrid 94.6 97.2 92.2 94.6 89.2 94.7
SVMs 85.3 97.9 73.7 84.1 70.9 85.8
kNN 94.8 97.5 92.2 94.8 89.5 94.9
4 4 28.56 Ensemble 94.8 96.1 93.5 94.8 89.5 94.8
Hybrid 94.6 97.2 92.2 94.6 89.2 94.7
SVMs 85.3 97.9 73.7 84.1 70.9 85.8
kNN 94.4 96.8 92.2 94.5 88.9 94.5
5 5 35.7 Ensemble 94.8 96.5 93.2 94.8 89.5 94.8
Hybrid 94.6 97.2 92.2 94.6 89.2 94.7
SVMs 85.5 97.9 74.0 84.3 71.2 86.0
kNN 94.4 96.5 92.5 94.5 88.9 94.5
6 6 42.84 Ensemble 93.9 94.0 93.8 93.9 87.8 93.9
Hybrid 94.4 96.5 92.5 94.5 88.9 94.5
SVMs 86.8 97.9 76.6 86.0 73.8 87.3
kNN 93.4 95.1 91.9 93.4 86.8 93.5
7 7 49.98 Ensemble 92.4 91.5 93.2 92.4 84.8 92.4
Hybrid 93.8 95.8 91.9 93.8 87.5 93.8

Table 15 C Dataset Summary Table – Best Methods(continue)


L FN FP Model Accuracy Sensitivity Specificity F1 Score Kappa AUC
SVMs 86.7 97.5 76.6 85.8 73.5 87.1
kNN 93.6 94.7 92.5 93.6 87.2 93.6
8 8 57.12 Ensemble 93.6 94.0 93.2 93.6 87.1 93.6
Hybrid 93.9 96.1 91.9 94.0 87.8 94.0
SVMs 85.8 96.5 76.0 85.0 71.8 86.2
kNN 91.9 94.4 89.6 91.9 83.8 92.0
9 9 64.26 Ensemble 95.3 97.2 93.5 95.3 90.5 95.3
Hybrid 94.1 97.5 90.9 94.1 88.2 94.2
SVMs 86.0 96.8 76.0 85.1 72.2 86.4
kNN 91.9 94.4 89.6 91.9 83.8 92.0
10 10 71.4 Ensemble 95.1 97.2 93.2 95.1 90.2 95.2
Hybrid 93.8 97.2 90.6 93.8 87.5 93.9
SVMs 86.5 96.1 77.6 85.9 73.1 86.9
kNN 89.5 93.0 86.4 89.5 79.1 89.7
11 11 78.54 Ensemble 95.4 96.8 94.2 95.5 90.9 95.5
Hybrid 92.9 98.2 88.0 92.8 85.8 93.1
SVMs 86.3 96.8 76.6 85.6 72.8 86.7
kNN 89.7 94.0 85.7 89.7 79.4 89.9
12 12 85.68 Ensemble 95.9 98.2 93.8 96.0 91.9 96.0
Hybrid 92.9 98.9 87.3 92.8 85.9 93.1
SVMs 86.1 96.8 76.3 85.3 72.5 86.6
kNN 89.5 94.0 85.4 89.5 79.1 89.7
13 13 92.82 Ensemble 95.9 98.2 93.8 96.0 91.9 96.0
Hybrid 92.9 98.9 87.3 92.8 85.9 93.1
SVMs 86.0 96.5 76.3 85.2 72.2 86.4
kNN 89.5 94.0 85.4 89.5 79.1 89.7
14 14 99.96 Ensemble 95.8 98.2 93.5 95.8 91.6 95.9
Hybrid 92.7 98.9 87.0 92.6 85.5 93.0

IJISRT24OCT1557 www.ijisrt.com 1484


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Table 16 D Dataset Summary Table – Best Methods


L FN FP Model Accuracy Sensitivity Specificity F1 Score Kappa AUC
SVMs 91.2 100.0 83.1 90.8 82.5 91.6
kNN 95.9 100.0 92.2 95.9 91.9 96.1
1 1 7.14 Ensemble 95.9 100.0 92.2 95.9 91.9 96.1
Hybrid 95.9 100.0 92.2 95.9 91.9 96.1
SVMs 91.4 100.0 83.4 91.0 82.9 91.7
kNN 95.3 100.0 90.9 95.2 90.6 95.5
2 2 14.28 Ensemble 95.9 100.0 92.2 95.9 91.9 96.1
Hybrid 95.3 100.0 90.9 95.2 90.6 95.5
SVMs 91.4 100.0 83.4 91.0 82.9 91.7
kNN 93.4 95.1 91.9 93.4 86.8 93.5
3 3 21.42 Ensemble 93.1 93.0 93.2 93.1 86.1 93.1
Hybrid 93.4 95.4 91.6 93.5 86.8 93.5
SVMs 91.4 100.0 83.4 91.0 82.9 91.7
kNN 93.6 95.1 92.2 93.6 87.2 93.6
4 4 28.56 Ensemble 93.6 94.0 93.2 93.6 87.1 93.6
Hybrid 93.6 95.4 91.9 93.6 87.2 93.7
SVMs 91.4 100.0 83.4 91.0 82.9 91.7
kNN 93.2 94.7 91.9 93.3 86.5 93.3
5 5 35.7 Ensemble 93.1 93.3 92.9 93.1 86.1 93.1
Hybrid 93.2 95.1 91.6 93.3 86.5 93.3
SVMs 91.4 100.0 83.4 91.0 82.9 91.7
kNN 93.1 94.4 91.9 93.1 86.1 93.1
6 6 42.84 Ensemble 92.4 91.9 92.9 92.4 84.8 92.4
Hybrid 93.1 94.7 91.6 93.1 86.1 93.1
SVMs 91.4 100.0 83.4 91.0 82.9 91.7
kNN 93.1 94.4 91.9 93.1 86.1 93.1
7 7 49.98 Ensemble 92.4 91.5 93.2 92.4 84.8 92.4
Hybrid 93.2 94.7 91.9 93.3 86.5 93.3

Table 17 D Dataset Summary Table – Best Methods (Continue)


L FN FP Model Accuracy Sensitivity Specificity F1 Score Kappa AUC
SVMs 91.6 100.0 83.8 91.2 83.2 91.9
kNN 93.2 94.4 92.2 93.3 86.5 93.3
8 8 57.12 Ensemble 93.2 93.3 93.2 93.2 86.5 93.2
Hybrid 93.8 95.4 92.2 93.8 87.5 93.8
SVMs 91.4 100.0 83.4 91.0 82.9 91.7
kNN 93.1 96.8 89.6 93.1 86.2 93.2
9 9 64.26 Ensemble 93.9 94.7 93.2 93.9 87.8 94.0
Hybrid 93.8 98.2 89.6 93.7 87.5 93.9
SVMs 91.6 100.0 83.8 91.2 83.2 91.9
kNN 93.1 96.8 89.6 93.1 86.2 93.2
10 10 71.4 Ensemble 93.6 94.7 92.5 93.6 87.2 93.6
Hybrid 94.1 98.6 89.9 94.1 88.2 94.3
SVMs 91.7 99.6 84.4 91.4 83.5 92.0
kNN 90.4 93.0 88.0 90.4 80.8 90.5
11 11 78.54 Ensemble 93.6 94.4 92.9 93.6 87.2 93.6
Hybrid 92.9 97.9 88.3 92.9 85.8 93.1
SVMs 91.7 99.6 84.4 91.4 83.5 92.0
kNN 90.0 92.6 87.7 90.1 80.1 90.1
12 12 85.68 Ensemble 94.6 96.1 93.2 94.6 89.2 94.7
Hybrid 93.1 98.2 88.3 93.0 86.2 93.3
SVMs 91.7 99.6 84.4 91.4 83.5 92.0
kNN 90.0 93.0 87.3 90.1 80.1 90.1
13 13 92.82 Ensemble 93.8 94.7 92.9 93.8 87.5 93.8
Hybrid 92.9 98.2 88.0 92.8 85.8 93.1
SVMs 91.9 99.6 84.7 91.6 83.9 92.2
kNN 90.0 93.0 87.3 90.1 80.1 90.1
14 14 99.96 Ensemble 94.4 96.1 92.9 94.5 88.8 94.5
Hybrid 92.7 97.9 88.0 92.7 85.5 92.9

IJISRT24OCT1557 www.ijisrt.com 1485


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Table 18 E Dataset Summary Table – Best Methods


L FN FP Model Accuracy Sensitivity Specificity F1 Score Kappa AUC
SVMs 96.1 100.0 92.5 96.1 92.2 96.3
kNN 97.3 100.0 94.8 97.3 94.6 97.4
1 1 7.14 Ensemble 97.5 100.0 95.1 97.5 94.9 97.6
Hybrid 97.3 100.0 94.8 97.3 94.6 97.4
SVMs 96.1 100.0 92.5 96.1 92.2 96.3
kNN 96.8 98.9 94.8 96.8 93.6 96.9
2 2 14.28 Ensemble 97.1 98.9 95.5 97.2 94.3 97.2
Hybrid 96.8 98.9 94.8 96.8 93.6 96.9
SVMs 96.1 100.0 92.5 96.1 92.2 96.3
kNN 95.6 95.8 95.5 95.6 91.2 95.6
3 3 21.42 Ensemble 95.6 95.1 96.1 95.6 91.2 95.6
Hybrid 95.4 95.8 95.1 95.5 90.9 95.5
SVMs 96.1 100.0 92.5 96.1 92.2 96.3
kNN 95.4 95.4 95.5 95.4 90.9 95.4
4 4 28.56 Ensemble 95.9 95.8 96.1 95.9 91.9 95.9
Hybrid 95.4 95.8 95.1 95.5 90.9 95.5
SVMs 96.1 100.0 92.5 96.1 92.2 96.3
kNN 95.3 95.4 95.1 95.3 90.5 95.3
5 5 35.7 Ensemble 95.4 95.1 95.8 95.4 90.9 95.4
Hybrid 95.3 95.8 94.8 95.3 90.5 95.3
SVMs 96.1 100.0 92.5 96.1 92.2 96.3
kNN 95.6 96.5 94.8 95.6 91.2 95.6
6 6 42.84 Ensemble 95.8 95.8 95.8 95.8 91.5 95.8
Hybrid 95.6 96.8 94.5 95.6 91.2 95.7
SVMs 96.3 100.0 92.9 96.3 92.6 96.4
kNN 95.4 95.8 95.1 95.5 90.9 95.5
7 7 49.98 Ensemble 95.1 94.4 95.8 95.1 90.2 95.1
Hybrid 95.3 96.1 94.5 95.3 90.5 95.3

Table 19 E Dataset Summary Table – Best Methods (Continue)


L FN FP Model Accuracy Sensitivity Specificity F1 Score Kappa AUC
SVMs 96.3 100.0 92.9 96.3 92.6 96.4
kNN 95.3 95.4 95.1 95.3 90.5 95.3
8 8 57.12 Ensemble 95.4 95.1 95.8 95.4 90.9 95.4
Hybrid 95.3 96.1 94.5 95.3 90.5 95.3
SVMs 95.9 100.0 92.2 95.9 91.9 96.1
kNN 93.2 95.1 91.6 93.3 86.5 93.3
9 9 64.26 Ensemble 94.9 95.4 94.5 94.9 89.9 95.0
Hybrid 94.8 97.5 92.2 94.8 89.5 94.9
SVMs 95.9 100.0 92.2 95.9 91.9 96.1
kNN 93.2 95.1 91.6 93.3 86.5 93.3
10 10 71.4 Ensemble 94.8 95.1 94.5 94.8 89.5 94.8
Hybrid 94.8 97.5 92.2 94.8 89.5 94.9
SVMs 95.9 100.0 92.2 95.9 91.9 96.1
kNN 93.1 96.1 90.3 93.1 86.2 93.2
11 11 78.54 Ensemble 95.4 96.1 94.8 95.5 90.9 95.5
Hybrid 95.6 99.3 92.2 95.6 91.2 95.8
SVMs 96.1 100.0 92.5 96.1 92.2 96.3
kNN 91.0 93.7 88.6 91.1 82.1 91.1
12 12 85.68 Ensemble 95.8 97.5 94.2 95.8 91.6 95.8
Hybrid 95.8 99.3 92.5 95.8 91.6 95.9
SVMs 95.9 100.0 92.2 95.9 91.9 96.1
kNN 91.2 94.0 88.6 91.2 82.4 91.3
13 13 92.82 Ensemble 95.9 97.9 94.2 96.0 91.9 96.0
Hybrid 95.4 98.9 92.2 95.5 90.9 95.6
SVMs 95.8 99.6 92.2 95.8 91.6 95.9
kNN 91.6 94.7 88.6 91.6 83.1 91.7
14 14 99.96 Ensemble 96.5 98.9 94.2 96.5 92.9 96.5
Hybrid 95.8 99.6 92.2 95.8 91.6 95.9

IJISRT24OCT1557 www.ijisrt.com 1486


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Table 20 F Dataset Dataset Summary Table – Best Methods


L FN FP Model Accuracy Sensitivity Specificity F1 Score Kappa AUC
SVMs 98.8 100.0 97.7 98.9 97.6 98.9
kNN 99.2 100.0 98.4 99.2 98.3 99.2
1 1 7.14 Ensemble 77.7 54.9 98.7 70.6 54.6 76.8
Hybrid 99.2 100.0 98.4 99.2 98.3 99.2
SVMs 98.8 100.0 97.7 98.9 97.6 98.9
kNN 99.0 100.0 98.1 99.0 98.0 99.0
2 2 14.28 Ensemble 99.3 100.0 98.7 99.3 98.6 99.4
Hybrid 99.0 100.0 98.1 99.0 98.0 99.0
SVMs 98.8 100.0 97.7 98.9 97.6 98.9
kNN 98.8 99.6 98.1 98.8 97.6 98.8
3 3 21.42 Ensemble 99.2 99.6 98.7 99.2 98.3 99.2
Hybrid 98.8 99.6 98.1 98.8 97.6 98.8
SVMs 98.8 100.0 97.7 98.9 97.6 98.9
kNN 98.8 99.6 98.1 98.8 97.6 98.8
4 4 28.56 Ensemble 99.2 99.6 98.7 99.2 98.3 99.2
Hybrid 98.8 99.6 98.1 98.8 97.6 98.8
SVMs 98.8 100.0 97.7 98.9 97.6 98.9
kNN 98.8 99.6 98.1 98.8 97.6 98.8
5 5 35.7 Ensemble 99.0 99.3 98.7 99.0 98.0 99.0
Hybrid 98.8 99.6 98.1 98.8 97.6 98.8
SVMs 98.8 100.0 97.7 98.9 97.6 98.9
kNN 98.6 99.6 97.7 98.7 97.3 98.7
6 46 42.84 Ensemble 99.0 99.3 98.7 99.0 98.0 99.0
Hybrid 98.6 99.6 97.7 98.7 97.3 98.7
SVMs 98.8 100.0 97.7 98.9 97.6 98.9
kNN 98.6 99.6 97.7 98.7 97.3 98.7
7 7 49.98 Ensemble 98.5 99.3 97.7 98.5 97.0 98.5
Hybrid 98.6 99.6 97.7 98.7 97.3 98.7

Table 21 F Dataset Summary Table – Best Methods (Continue)


L FN FP Model Accuracy Sensitivity Specificity F1 Score Kappa AUC
SVMs 98.6 99.6 97.7 98.7 97.3 98.7
kNN 98.3 98.9 97.7 98.3 96.6 98.3
8 8 57.12 Ensemble 97.6 97.5 97.7 97.6 95.3 97.6
Hybrid 98.5 99.3 97.7 98.5 97.0 98.5
SVMs 98.6 99.6 97.7 98.7 97.3 98.7
kNN 97.5 97.5 97.4 97.5 94.9 97.5
9 9 64.26 Ensemble 99.0 99.3 98.7 99.0 98.0 99.0
Hybrid 98.5 99.3 97.7 98.5 97.0 98.5
SVMs 98.6 99.6 97.7 98.7 97.3 98.7
kNN 97.5 97.5 97.4 97.5 94.9 97.5
10 10 71.4 Ensemble 99.0 99.3 98.7 99.0 98.0 99.0
Hybrid 98.5 99.3 97.7 98.5 97.0 98.5
SVMs 98.6 99.6 97.7 98.7 97.3 98.7
kNN 97.3 98.9 95.8 97.3 94.6 97.4
11 11 78.54 Ensemble 99.2 99.6 98.7 99.2 98.3 99.2
Hybrid 98.8 100.0 97.7 98.9 97.6 98.9
SVMs 98.5 99.3 97.7 98.5 97.0 98.5
kNN 96.3 97.2 95.5 96.3 92.6 96.3
12 12 85.68 Ensemble 99.2 99.6 98.7 99.2 98.3 99.2
Hybrid 98.8 100.0 97.7 98.9 97.6 98.9
SVMs 98.5 99.3 97.7 98.5 97.0 98.5
kNN 96.6 97.9 95.5 96.7 93.2 96.7
13 13 92.82 Ensemble 98.0 97.2 98.7 97.9 95.9 97.9
Hybrid 98.8 100.0 97.7 98.9 97.6 98.9
SVMs 98.6 99.6 97.7 98.7 97.3 98.7
kNN 96.8 98.2 95.5 96.8 93.6 96.8
14 14 99.96 Ensemble 98.3 97.9 98.7 98.3 96.6 98.3
Hybrid 98.8 100.0 97.7 98.9 97.6 98.9

IJISRT24OCT1557 www.ijisrt.com 1487


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Fig 6 Performance of All Classification Models on A Dataset

IJISRT24OCT1557 www.ijisrt.com 1488


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Fig 7 Performance of All Classification Models on B Dataset

IJISRT24OCT1557 www.ijisrt.com 1489


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Fig 8 Performance of All Classification Models on C Dataset

IJISRT24OCT1557 www.ijisrt.com 1490


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Fig 9 Performance of All Classification Models on D Dataset

IJISRT24OCT1557 www.ijisrt.com 1491


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Fig 10 Performance of All Classification Models on E Dataset

IJISRT24OCT1557 www.ijisrt.com 1492


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

Fig 11 Performance of All Classification Models on F Dataset

IJISRT24OCT1557 www.ijisrt.com 1493


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

IV. DISCUSSION Similar to our study, some works in the literature have
employed common stages such as creating subsets by
In our study, classification methods such as Ensemble, reducing features, data cleaning, feature selection, and feature
kNN, SVMs, and Hybrid Artificial Intelligence were applied extraction. However, these studies did not achieve high
within the scope of machine learning. Based on the results accuracy rates [55], [56], [57].
obtained, a breast cancer diagnosis algorithm was developed.
The literature reveals that various studies have employed In the literature, some studies have divided the dataset
different machine learning and feature selection algorithms to into 80% training and 20% testing. Similar to these studies,
work on datasets with different characteristics for breast many others have created subsets by reducing features and
cancer diagnosis. conducted feature selection [31]. In another study, the training
and testing datasets were split 66% to 33%, where SVMs
In the literature, various machine learning techniques achieved the best accuracy performance with 96.9957% [34].
have been utilized, including Artificial Neural Network However, despite most models in the literature achieving
(ANN), Support Vector Machines (SVMs), Naive Bayes accuracy rates above 90%, the highest accuracy rate found in
(NB), Classification and Regression Tree (CART), k-Nearest our study, using a 50% training and 50% testing split, did not
Neighbors (kNN), Linear Regression (LR), Multilayer exceed 99.3% with the Ensemble method. The highest
Perceptron (MLP), Random Forest, Extreme Boost, Decision accuracy rates in our study were 99.2% with the kNN method,
Tree (C4.5), Logistic Regression, Linear Discriminant 98.8% with the SVMs method, and 99.2% with the Hybrid
Analysis, Boosting and AdaBoost, Bagging Algorithm, IBk method.
(Instance-based learning with certain parameters), and
Random Committee Algorithm. In another study from the literature, the highest
specificity rate was found to be 99.07% and the highest
In the literature, various performance metrics have been sensitivity rate was 98.41% [30]. However, in our study, the
employed to evaluate the performance of machine learning highest specificity rate was 98.7%, while the highest
models. These include F-Measure, AUC (Area Under the sensitivity rate reached 100%.
ROC Curve), ROC (Receiver Operating Characteristic) curve,
accuracy, recall, sensitivity, specificity, kappa statistics, TP In another study from the literature, the highest AUC
Rate (True Positive Rate), FP Rate (False Positive Rate), value was 99.5% using the SVMs method, while in our study,
MCC (Matthews Correlation Coefficient), Area Under the the AUC value was 99.4% with the Ensemble method [57].
Receiver Operating Characteristics, time complexity, Lift Regarding the F1 score, the highest value in the literature was
Curve, Calibration Plot, and techniques like Recursive Feature 98.1% using the SVMs method, whereas in our study, the
Elimination. highest F1 score was found to be 99.3% with the Ensemble
method [57].
In our study, performance evaluation criteria similar to
those used in the literature, such as accuracy, specificity, In studies that used the same dataset and similar machine
sensitivity, kappa statistic, F-Measure, and AUC, were also learning algorithms as in our research, the highest accuracy
applied [27], [28]. rate was found to be 82.70% using the Random Forest
method, the highest specificity rate was 84% using the SVMs
In our study, the highest accuracy rate was achieved method, and the highest sensitivity rate was 84% using the
using the Ensemble method at 99.3%, the highest specificity Extreme Boost method [29].
rate was also obtained with the Ensemble method at 98.7%,
and the highest sensitivity rate was found to be 100% across When reviewing all studies, it is observed that most
multiple methods. performance criteria in machine learning models do not
exceed 99.68% accuracy [26]. Comparing the study with the
In the literature, machine learning algorithms have been highest accuracy rate in the literature to our work, the
developed on various platforms such as R programming, accuracy rate of 99.68% achieved using the SVMs algorithm
Weka, Spark, and Python [47], [48], [49]. The machine in a Spark environment is higher than any accuracy rate in our
learning algorithms implemented on these platforms have study. However, in the study that employed deep learning
been observed to be less effective compared to those used in techniques such as CNN, SAE, and SSAE, only one of the
this study. four machine learning algorithms used in our study produced
a lower accuracy rate. The remaining three machine learning
In contrast to our study, the majority of studies in the algorithms in our study outperformed the accuracy rates of the
literature have been conducted using the Wisconsin deep learning techniques used in the literature [51].
Diagnostic Breast Cancer (WDBC) dataset [50], [51], [52].
In a study from the literature that applied similar stages
In the literature, only one study used the same dataset as such as feature selection and feature extraction techniques to
our research [29]. ANN, SVMs, and NB for breast cancer, the highest specificity
rate was found to be 99.07%, and the highest sensitivity rate
Other studies, however, were conducted on datasets was 98.41%. These values represent the highest specificity
different from the Wisconsin dataset [53], [54]. and sensitivity rates reported in the literature for breast cancer
research [30]. However, in our study, the highest specificity

IJISRT24OCT1557 www.ijisrt.com 1494


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

rate was found to be 98.7%, and the sensitivity rate reached 99.2% with the kNN method, 98.8% with the SVMs method,
100%. These results demonstrate that our study is a reliable and 99.2% with the Hybrid method. This demonstrates that
approach for detecting both diseased and healthy individuals our study outperforms similar studies in the literature,
in breast cancer diagnosis. particularly in the healthcare field.

In a study comparing six different machine learning In another study, machine learning algorithms such as
techniques—CART, SVMs, NB, kNN, LR, and MLP— for SVMs, C4.5, NB, and kNN were used. Each algorithm was
breast cancer diagnosis, the dataset was split into 80% training evaluated in terms of accuracy, precision, sensitivity, and
and 20% testing. Similar to our study, some studies have also specificity. The highest accuracy rate was observed with
created subsets by reducing features and performing feature SVMs at 97.13%, while the accuracy rates for C4.5, NB, and
selection [31]. However, despite most models in the literature kNN ranged between 95.12% and 95.28%. All the
achieving accuracy rates above 90%, the highest accuracy rate applications in this study were conducted using the WEKA
found in our study, using a 50% training and 50% testing data mining tool [33]. In our study, which examined similar
split, did not exceed 99.3% with the Ensemble method. This algorithms using Matlab, the highest accuracy rates were
indicates that our study is more applicable and suitable for use 99.3% with the Ensemble method, 99.2% with the kNN
in the healthcare field. method, 98.8% with the SVMs method, and 99.2% with the
Hybrid method. This indicates that despite using similar
In some studies, the same dataset used in our research algorithms, our study achieved higher accuracy rates on a
was also employed, along with similar machine learning different platform. Consequently, this demonstrates that the
algorithms. In these studies, the highest accuracy rate was algorithms used in our study performed better in terms of
found to be 82.70% with the Random Forest method, the accuracy, and it also suggests that our research is more
highest specificity rate was 84% with the SVMs method, and effective in the healthcare field compared to the mentioned
the highest sensitivity rate was 84% with the Extreme Boost study in the literature.
method [29]. In contrast, the results from our study showed
the highest accuracy rate of 99.3%, the highest specificity rate In a different study from the literature, three machine
of 98.7%, both achieved with the Ensemble method, and the learning techniques—SVMs, Random Forest (RF), and
highest sensitivity rate of 100% with multiple methods. This Bayesian Networks (BN)—were applied and compared. These
demonstrates that our study can achieve higher accuracy, techniques were evaluated based on accuracy, recall,
specificity, and sensitivity rates with different machine precision, and the area under the ROC curve. The entire study
learning techniques. Consequently, it reinforces the reliability was conducted using the WEKA environment, and the highest
of our study in detecting both diseased and healthy individuals accuracy rate achieved was 97% [35]. In contrast, our study,
in breast cancer diagnosis, indicating its applicability and conducted in the Matlab environment, achieved a higher
suitability for use in the healthcare field. accuracy rate of 99.3%. This result suggests that our study
may provide more reliable outcomes in breast cancer
In another study, ANN and SVMs were used for breast diagnosis compared to the mentioned study.
cancer classification prediction, implemented using WEKA.
The training and testing datasets were split 66% to 33%. The In a study from the literature, machine learning
experimental results showed that SVMs achieved the best techniques such as C4.5, SVMs, and ANN were applied and
accuracy performance at 96.9957% [34]. Despite being evaluated in terms of accuracy, specificity, and sensitivity.
conducted on a different platform, with fewer machine The analysis results showed that the accuracy values for DT,
learning algorithms and different training and testing ANN, and SVMs were 93.6%, 94.7%, and 95.7%,
percentages, the study did not achieve an accuracy rate close respectively [53]. In contrast, our study, which examined
to the highest value of 99.3% found in our study, where a 50% similar algorithms, achieved higher accuracy rates: 99.3%
training and 50% testing split was applied. This suggests that with the Ensemble method, 99.2% with the kNN method,
the approach in our study could yield superior results in the 98.8% with the SVMs method, and 99.2% with the Hybrid
healthcare field. method. Despite the various machine learning techniques
applied in the literature, none of them surpassed the accuracy
In another study, machine learning algorithms such as values obtained in our study. This indicates that the machine
Random Forest (RF), Naive Bayes (NB), SVMs, and kNN learning techniques used in our research could yield more
were used. After applying feature selection and extraction, reliable results, particularly in breast cancer diagnosis.
these algorithms were implemented using the WEKA
program, with the dataset labels classified as benign and In another study from the literature, machine learning
malignant. However, due to missing values, the number of algorithms such as RF, kNN, and NB were used. Conducted
data points in the dataset was reduced. Similar to our study, in the Python environment, this study compared machine
this research applied data reduction techniques due to data learning algorithms based on accuracy, precision, and F1-
imbalance and used comparable machine learning algorithms Score parameters. The results showed that the highest
and approaches. Nevertheless, the highest accuracy rates accuracy was achieved with the kNN method at 95.9%, the
achieved were 97.9% with the SVMs method, 96% with the highest precision with kNN at 98.27%, the highest recall with
RF method, 92.6% with the Naive Bayes method, and 96.1% RF at 93.65%, and the highest F1-Score with kNN at 94.2%.
with the kNN method [32]. In contrast, our study achieved The accuracy rates for RF and NB were 94.74% and 94.47%,
higher accuracy rates: 99.3% with the Ensemble method, respectively [47]. In our study, while the machine learning

IJISRT24OCT1557 www.ijisrt.com 1495


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

algorithms and methods applied are similar to those in the However, when considering the study that applied deep
mentioned study, our research was conducted in a different learning techniques like CNN, SAE, and SSAE, only one of
platform, specifically Matlab. Despite these similarities, our the four machine learning algorithms used in our study
study achieved superior results, with the highest accuracy achieved a lower accuracy rate than the deep learning
rates being 99.3% with the Ensemble method, 99.2% with the techniques. The remaining three algorithms in our study
kNN method, 98.8% with the SVMs method, and 99.2% with outperformed the accuracy rates achieved by these deep
the Hybrid method. Additionally, the highest F1-Score was learning techniques. This suggests that, in the context of
99.2% with the Ensemble method. These results indicate that breast cancer diagnosis, the study from the literature is
our study has obtained more efficient outcomes compared to somewhat more successful than our own, particularly when
the mentioned study, particularly in the diagnosis of breast using the SVMs algorithm in a different environment.
cancer, demonstrating the potential for superior results in this However, our study demonstrated superiority over the
critical area. literature that employed deep learning techniques, indicating
that our approach remains reliable and effective for breast
In a study comparing machine learning algorithms such cancer diagnosis. Despite this, it is important to acknowledge
as SVMs, C4.5, Naive Bayes (NB), and kNN, each algorithm that there is still a study in the literature that has outperformed
was evaluated in terms of accuracy, precision, sensitivity, and ours in terms of accuracy.
specificity. The highest accuracy rate was found to be 97.13%
with the SVMs method. The accuracy rates for C4.5, Naive In a different study from the literature, data mining
Bayes, and kNN varied between 95.12% and 95.28%. All algorithms such as the Bagging Algorithm, IBk, Random
applications were conducted using WEKA data mining [50]. Committee Algorithm, Random Forest Algorithm, and Simple
In contrast, the results obtained in our study, conducted in the Classification and Regression Tree (Simple CART
Matlab environment, showed higher performance: the highest Algorithm) were used for the diagnosis and prediction of
accuracy rate was 99.3% with the Ensemble method, the breast cancer. The results were analyzed in the WEKA
highest specificity rate was 98.7% with the Ensemble method, program using Bayes, Function, Meta, Lazy, Trees, and other
and the highest sensitivity rate was 100% across multiple perspectives. The analysis revealed that the Random Forest
methods. This demonstrates that our study, using similar Algorithm had the highest accuracy level, making it the most
machine learning techniques, achieved higher percentages of suitable algorithm for breast cancer diagnosis. The accuracy
accuracy, specificity, and sensitivity. It also indicates that our rate for the Random Forest algorithm was found to be 92.2%,
research has the lowest error rate in breast cancer diagnosis while the Bagging, IBk, and Random Committee Algorithms
and the highest capacity for accurate classification. achieved accuracy rates of 90.9%, 90%, and 90.9%,
respectively [48]. In contrast, our study conducted in the
The primary objective of the study in the literature was Matlab environment differed in terms of the machine learning
to review various data mining and machine learning algorithms and methods applied compared to the
algorithms used for breast cancer prediction. The focus was aforementioned study. The highest accuracy rates achieved in
on identifying the most suitable algorithm with the highest our study were 99.3% with the Ensemble method, 99.2% with
accuracy for breast cancer diagnosis. The study examined the kNN method, 98.8% with the SVMs method, and 99.2%
linear algorithms (e.g., Logistic Regression, Linear with the Hybrid method. These results indicate that our study
Discriminant Analysis), nonlinear algorithms (e.g., CART, utilized more suitable machine learning algorithms, yielding
Naive Bayes, kNN, SVMs), and ensemble algorithms (e.g., more efficient outcomes when compared to the literature. This
Ctree, Random Forest, Boosting, AdaBoost). A comparative demonstrates that our research is superior in terms of the
analysis of each algorithm's accuracy was performed, and the algorithms and methods applied for breast cancer diagnosis.
most appropriate machine learning algorithms for breast
cancer diagnosis were identified. It was found that different In another study from the literature, SVMs, Ctree, and
techniques were suitable under different conditions and Random Forest algorithms were used for classifying nine
datasets. Among all machine learning algorithms compared, models in the diagnosis of breast cancer. The dataset was
SVMs emerged as the most appropriate for breast cancer processed using two different data mining tools, WEKA and
prediction, achieving the highest accuracy of 98.03% in Spark, where the accuracy and error rates of the algorithms
WEKA and 99.68% in Spark. Additionally, when applied to a were compared. The study filtered two datasets (GE and DM)
different dataset collected from another database, deep to obtain genes primarily responsible for the presence of
learning techniques such as CNN, SAE, and SSAE achieved tumors. The comparisons between the algorithms in both tools
an accuracy rate of 98.9% \cite{Fatima2020}. In comparison, revealed that SVMs had the highest accuracy among the
our study was conducted in Matlab, similar to the platforms algorithms, with an accuracy rate of 99.68% in Spark and
used in the literature, and employed comparable algorithms 98.03% in WEKA [26]. In contrast, our study utilized
such as Ensemble, kNN, SVMs, and Hybrid. The accuracy machine learning algorithms in the Matlab environment rather
rates in our study were 99.3% with the Ensemble method, than data mining tools. Despite employing similar algorithms,
99.2% with the kNN method, 98.8% with the SVMs method, such as SVMs, our study achieved the highest accuracy rates
and 99.2% with the Hybrid method. When comparing our of 99.3% with the Ensemble method, 99.2% with the kNN
study with the literature, it is observed that the highest method, 98.8% with the SVMs method, and 99.2% with the
accuracy rate achieved in the literature using the SVMs Hybrid method. When comparing the accuracy rates of this
algorithm in the WEKA environment was 99.68%, which is high-accuracy study in the literature to those in our study, it is
higher than the accuracy rates achieved in our study. evident that while some algorithms in the literature surpassed

IJISRT24OCT1557 www.ijisrt.com 1496


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

those in our study, overall, our research demonstrated a higher applied models [57]. In our study, conducted in the Matlab
potential for accurately diagnosing breast cancer. This environment, different machine learning algorithms were
suggests that the methods used in our study could offer more used, but similar evaluation metrics such as accuracy, F1
reliable results in breast cancer diagnosis. Score, and AUC were employed. When comparing these
metrics, the highest accuracy rate in the literature was 98.1%
In a study aligned with our research, the NB, kNN, and with the SVMs method, while our study achieved 99.3% with
J48 algorithms were used to predict nine different types of the Ensemble method, 99.2% with the kNN method, 98.8%
breast cancer. The study initially compared symptoms based with the SVMs method, and 99.2% with the Hybrid method.
on the training dataset to test the accuracy of the results, with For the AUC value, the literature reported the highest rate of
matching symptoms indicating correctness. Throughout this 99.5% with the SVMs method, whereas our study found
process, different types of breast cancer were predicted, and 99.4% with the Ensemble method. Regarding the F1 Score,
each algorithm was classified based on accuracy rates. It was the highest rate in the literature was 98.1% with the SVMs
found that the accuracy rates of NB and kNN were higher method, compared to 99.3% with the Ensemble method in our
than that of the J48 decision tree classifier, with accuracy study. Although the AUC value in the literature was slightly
values of 98.2%, 98.8%, and 98.5%, respectively [54]. In our higher than in our study, our research demonstrated higher F1
study conducted in the Matlab environment, despite using and accuracy values. This indicates that while the literature's
similar machine learning algorithms like kNN, the highest model might be reliable in predicting breast cancer based on
accuracy rates achieved were 99.3% with the Ensemble the AUC value, the F1 and accuracy values can vary. When
method, 99.2% with the kNN method, 98.8% with the SVMs comparing these common evaluation metrics, our study shows
method, and 99.2% with the Hybrid method. This indicates higher performance and suggests that more reliable results can
that the findings in our study could lead to higher detection be obtained in breast cancer diagnosis.
rates of breast cancer, suggesting a more effective approach in
the healthcare field for identifying breast cancer with greater In a study from the literature, a new Nested Ensemble
accuracy. (NE) technique was used for breast cancer detection. This
study created four two-layered Nested Ensemble classifiers
In the literature, a study focused on breast cancer based on voting and stacking techniques, named SV-
diagnosis utilized a predictive machine learning model based BayesNet-2-MetaClassifier, SV-Naive Bayes-2-
on SVMs with a recursive feature elimination technique. The MetaClassifier, SV-BayesNet-3-MetaClassifier, and SV-
goal of the study was to select the correct features from a Naive Bayes-3-MetaClassifier. In addition to these four
dataset of individuals with benign and malignant tumors. The classifiers, BayesNet and Naive Bayes (NB) classifiers were
recursive feature elimination technique was employed to also evaluated. The performance of these classifiers was
evaluate the SVMs algorithm, and the performance matrix assessed using typical metrics such as accuracy, precision,
was designed to check the accuracy of the predictive SVMs recall, and ROC Curve. All experiments were conducted
model across different kernel types. The study reported an using the open-source machine learning software WEKA
accuracy of 99% with the linear kernel, 98% with the RBF 3.9.1. The results showed that the highest accuracy for the
kernel, 97% with the polynomial kernel, and 84% with the BayesNet algorithm was 95.25% with an F1 score of 95.30%,
sigmoid kernel [55]. In our study conducted in the Matlab while the NB algorithm achieved a maximum accuracy of
environment, despite using a similar machine learning 93.32% with an F1 score of 93.30%. The SV-BayesNet-2-
algorithm like SVMs, the highest accuracy rates achieved MetaClassifier and SV-Naive Bayes-2-MetaClassifier both
were 99.3% with the Ensemble method, 99.2% with the kNN reached a maximum accuracy of 97.72% with an F1 score of
method, 98.8% with the SVMs method, and 99.2% with the 97.70%. The SV-BayesNet-3-MetaClassifier and SV-Naive
Hybrid method. When comparing the SVMs results Bayes-3-MetaClassifier both achieved the highest accuracy of
specifically, the literature reported a slightly higher accuracy 98.07% with an F1 score of 98.10% [52]. In our study
with the SVMs method. However, when evaluating the conducted in the Matlab environment, different machine
overall system, all other machine learning algorithms in our learning algorithms were used, yet similar evaluation metrics
study outperformed those in the mentioned study. This such as accuracy and F1 Score were applied. When
suggests that our study may provide more reliable and comparing these metrics, the highest accuracy rate in the
accurate results in the healthcare field, particularly in breast literature was 98.07%, whereas in our study, the highest
cancer diagnosis, indicating the potential for better detection accuracy rates were 99.3% with the Ensemble method, 99.2%
and classification outcomes. with the kNN method, 98.8% with the SVMs method, and
99.2% with the Hybrid method. Regarding the F1 Score, the
In a different study from the literature, an effective highest value in the literature was 98.1%, while in our study,
model for early-stage breast cancer detection was proposed. the highest F1 Score was 99.3% with the Ensemble method.
The BCD model in the literature aimed to address data-related These comparisons show that the F1 and accuracy values in
problems and improve classifier performance using a 10-fold the literature are significantly lower than those obtained in our
cross-validation technique with SVMs. Various evaluation study. Therefore, when examining common metrics like F1
metrics, such as F1 measure, class accuracy, ROC Curve, and accuracy, it is evident that our study can achieve more
AUC, Lift Curve, and Calibration Plot, were used, and models reliable and accurate results in breast cancer diagnosis.
like AdaBoost, Random Forest, and Naïve Bayes were
applied. The proposed BCD model achieved the highest In a different study from the literature, data mining tools
accuracy of 98.1% and an AUC value of 0.995 among all were used for breast cancer prediction. The primary focus of

IJISRT24OCT1557 www.ijisrt.com 1497


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

the study was to classify algorithms such as Naive Bayes between our study and the literature, it is evident that the
(NB), Bayesian Logistic Regression, Simple CART, and J48 values in our study are significantly higher. This indicates that
based on accuracy and time complexity. The analysis our research has the potential to achieve a higher performance
conducted in the WEKA environment found the highest in breast cancer prediction.
accuracy rates as follows: 95.27% with the NB method,
65.42% with Bayesian Logistic Regression, 98.13% with When all these results are compared, our study
Simple CART, and 97.27% with the J48 method [58]. In our demonstrates that it can achieve higher percentages of
study, conducted in Matlab, completely different algorithms accuracy, specificity, and sensitivity using different machine
were used, resulting in higher accuracy rates: 99.3% with the learning techniques. This reinforces the reliability of our study
Ensemble method, 99.2% with the kNN method, 98.8% with in accurately identifying both diseased and healthy individuals
the SVMs method, and 99.2% with the Hybrid method. As a in breast cancer diagnosis, highlighting its suitability and
result, all algorithms used in our study achieved accuracy applicability in the healthcare field as a more effective and
rates higher than those reported in the literature. This practical approach.
demonstrates that our research offers a more suitable approach
for breast cancer diagnosis in the healthcare field.  Conflict of Interest
The authors declare that there is no conflict of interest.
In a different study from the literature, five non-linear
machine learning algorithms—MLP, kNN, CART, SVMs,  Financial Support
and Gaussian NB—were compared for breast cancer No funding was received for this research.
detection. The study's primary objective was to evaluate the
effectiveness of these algorithms in terms of accuracy, REFERENCES
precision, and recall for breast cancer detection. The accuracy
rates of all algorithms were analyzed, with MLP achieving the [1]. “What Is Breast Cancer? | CDC.” Accessed: Jan. 14,
highest accuracy of 99.12%, outperforming kNN, CART, NB, 2024. [Online]. Available:
and SVMs. The accuracy rates for the other algorithms were https://www.cdc.gov/cancer/breast/basic_info/what-is-
95.61% with kNN, 93.85% with CART, 94.73% with NB, breast-cancer.htm
and 98.24% with SVMs [59]. In our study conducted in [2]. M. Z. Islam, M. M. Islam, and A. Asraf, “A combined
Matlab, similar algorithms like kNN and SVMs were used, deep CNN-LSTM network for the detection of novel
achieving higher accuracy rates: 99.3% with the Ensemble coronavirus (COVID-19) using X-ray images,”
method, 99.2% with kNN, 98.8% with SVMs, and 99.2% Informatics Med. Unlocked, vol. 20, Jan. 2020, doi:
with the Hybrid method. These results indicate that our study 10.1016/j.imu.2020.100412.
achieved higher accuracy values for similar machine learning [3]. “Breast Cancer Facts and Statistics 2024.” Accessed:
techniques like kNN and SVMs. This demonstrates that our Jan. 15, 2024. [Online]. Available:
research can potentially provide more precise results in breast https://www.breastcancer.org/facts-
cancer diagnosis within medical applications. statistics?gad_source=1&gclid=CjwKCAiAzJOtBhA
LEiwAtwj8tlbQuo59n0mvpqVNs4YuzG07eSYQa53
In another study from the literature, researchers w4PbnkQQYEyqnfQyC5Nq41hoCSMIQAvD_BwE
conducted a comparative analysis of NB, Random Forest, [4]. “What Is Breast Cancer? | American Cancer Society |
Logistic Regression, MLP, and kNN for breast cancer American Cancer Society.” Accessed: Jan. 15, 2024.
prediction. The evaluation of all these algorithms was [Online]. Available:
conducted based on metrics such as Kappa Statistics, TP Rate, https://www.cancer.org/cancer/types/breast-
FP Rate, Precision, Recall, F-Measure, MCC, and ROC Area, cancer/about/what-is-breast-cancer.html
focusing on the accuracy of each algorithm. Each algorithm [5]. M. Fan et al., “Joint Prediction of Breast Cancer
was applied to the dataset to analyze its accuracy. The Histological Grade and Ki-67 Expression Level Based
accuracy rates for kNN, NB, and Random Forest were 72.3%, on DCE-MRI and DWI Radiomics,” IEEE J. Biomed.
71.6%, and 69.5%, respectively, while Logistic Regression Heal. Informatics, vol. 24, no. 6, pp. 1632–1642, Jun.
and MLP achieved accuracy rates of 68.8% and 64.6%, 2020, doi: 10.1109/JBHI.2019.2956351.
respectively. In terms of F-Measure, the values for kNN, NB, [6]. F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L.
and Random Forest were 69.7%, 71.7%, and 66.9%, while A. Torre, and A. Jemal, “Global cancer statistics
Logistic Regression and MLP achieved F-Measure values of 2018: GLOBOCAN estimates of incidence and
67.5% and 64.7%, respectively [49]. In our study conducted mortality worldwide for 36 cancers in 185 countries,”
in Matlab, similar machine learning algorithms like kNN were CA. Cancer J. Clin., vol. 68, no. 6, pp. 394–424, Nov.
used, along with similar evaluation metrics such as accuracy 2018, doi: 10.3322/caac.21492.
and F1 Score. When comparing all parameters, the highest [7]. “Breast cancer.” Accessed: Jan. 15, 2024. [Online].
accuracy rate in the literature was 72.3% with kNN, whereas Available: https://www.who.int/news-room/fact-
in our study, the highest accuracy rates were 99.3% with the sheets/detail/breast-cancer
Ensemble method, 99.2% with kNN, 98.8% with SVMs, and [8]. “Content on Early Breast Cancer.” Accessed: Jan. 26,
99.2% with the Hybrid method. In terms of F1 Score, the 2024. [Online]. Available:
highest value in the literature was 71.7%, while in our study, https://www.webmd.com/breast-cancer/toc-early-
the highest F1 Score was 99.3% with the Ensemble method. breast-cancer
When comparing the commonly used accuracy and F1 values

IJISRT24OCT1557 www.ijisrt.com 1498


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

[9]. “Breast Cancer Treatment Options - National Breast [22]. A. Kumar Jakhar, M. Singh, and A. Gupta, “SELF: A
Cancer Foundation.” Accessed: Jan. 15, 2024. Stacked-based Ensemble Learning Framework for
[Online]. Available: Breast Cancer Classiication SELF: A Stacked-based
https://www.nationalbreastcancer.org/breast-cancer- Ensemble Learning Framework for Breast Cancer
treatment/ Classification,” 2022, doi: 10.21203/rs.3.rs-
[10]. V. Chaurasia, S. Pal, and B. B. Tiwari, “Prediction of 2013877/v1.
benign and malignant breast cancer using data mining [23]. A. F. M. Agarap, “On breast cancer detection: An
techniques,” J. Algorithms Comput. Technol., vol. 12, application of machine learning algorithms on the
no. 2, pp. 119–126, Jun. 2018, doi: Wisconsin diagnostic dataset,” in ACM International
10.1177/1748301818756225. Conference Proceeding Series, Association for
[11]. K. Cheng, J. Wang, J. Liu, X. Zhang, Y. Shen, and H. Computing Machinery, Feb. 2018, pp. 5–9. doi:
Su, “Public health implications of computer-aided 10.1145/3184066.3184080.
diagnosis and treatment technologies in breast cancer [24]. A. U. Haq et al., “Detection of Breast Cancer through
care,” AIMS Public Heal., vol. 10, no. 4, p. 867, 2023, Clinical Data Using Supervised and Unsupervised
doi: 10.3934/PUBLICHEALTH.2023057. Feature Selection Techniques,” IEEE Access, vol. 9,
[12]. E. H. Houssein, M. M. Emam, A. A. Ali, and P. N. pp. 22090–22105, 2021, doi:
Suganthan, “Deep and machine learning techniques 10.1109/ACCESS.2021.3055806.
for medical imaging-based breast cancer: A [25]. U. Naseem et al., “An Automatic Detection of Breast
comprehensive review,” Apr. 01, 2021, Elsevier Ltd. Cancer Diagnosis and Prognosis Based on Machine
doi: 10.1016/j.eswa.2020.114161. Learning Using Ensemble of Classifiers,” IEEE
[13]. V. J. Kadam, S. M. Jadhav, and K. Vijayakumar, Access, vol. 10, pp. 78242–78252, 2022, doi:
“Breast Cancer Diagnosis Using Feature Ensemble 10.1109/ACCESS.2022.3174599.
Learning Based on Stacked Sparse Autoencoders and [26]. S. Alghunaim and H. H. Al-Baity, “On the Scalability
Softmax Regression,” J. Med. Syst., vol. 43, no. 8, of Machine-Learning Algorithms for Breast Cancer
Aug. 2019, doi: 10.1007/s10916-019-1397-z. Prediction in Big Data Context,” IEEE Access, vol. 7,
[14]. S. I. Ayon, M. M. Islam, and M. R. Hossain, pp. 91535–91546, 2019, doi:
“Coronary Artery Heart Disease Prediction: A 10.1109/ACCESS.2019.2927080.
Comparative Study of Computational Intelligence [27]. M. K. Uçar, M. R. Bozkurt, C. Bilgin, and K. Polat,
Techniques,” IETE J. Res., vol. 68, no. 4, pp. 2488– “Automatic detection of respiratory arrests in OSA
2507, 2022, doi: 10.1080/03772063.2020.1713916. patients using PPG and machine learning techniques,”
[15]. L. J. Muhammad, M. M. Islam, S. S. Usman, and S. I. Neural Comput. Appl., vol. 28, no. 10, pp. 2931–
Ayon, “Predictive Data Mining Models for Novel 2945, Oct. 2017, doi: 10.1007/s00521-016-2617-9.
Coronavirus (COVID-19) Infected Patients’ [28]. M. K. Uçar, M. R. Bozkurt, C. Bilgin, and K. Polat,
Recovery,” SN Comput. Sci., vol. 1, no. 4, Jul. 2020, “Automatic sleep staging in obstructive sleep apnea
doi: 10.1007/s42979-020-00216-w. patients using photoplethysmography, heart rate
[16]. M. R. Haque, M. M. Islam, H. Iqbal, M. Sumon Reza, variability signal and machine learning techniques,”
and M. K. Hasan, “Performance Evaluation of Neural Comput. Appl., vol. 29, no. 8, pp. 1–16, Apr.
Random Forests and Artificial Neural Networks for 2018, doi: 10.1007/s00521-016-2365-x.
the Classification of Liver Disorder.” [29]. M. D. Ganggayah, N. A. Taib, Y. C. Har, P. Lio, and
[17]. S. Islam Ayon and M. Milon Islam, “Diabetes S. K. Dhillon, “Predicting factors for survival of
Prediction: A Deep Learning Approach,” Int. J. Inf. breast cancer patients using machine learning
Eng. Electron. Bus., vol. 11, no. 2, pp. 21–27, Mar. techniques,” BMC Med. Inform. Decis. Mak., vol. 19,
2019, doi: 10.5815/ijieeb.2019.02.03. no. 1, pp. 1–17, 2019, doi: 10.1186/s12911-019-0801-
[18]. M. F. Ak, “A comparative analysis of breast cancer 4.
detection and diagnosis using data visualization and [30]. D. A. Omondiagbe, S. Veeramani, and A. S. Sidhu,
machine learning applications,” Healthc., vol. 8, no. 2, “Machine Learning Classification Techniques for
2020, doi: 10.3390/healthcare8020111. Breast Cancer Diagnosis,” IOP Conf. Ser. Mater. Sci.
[19]. R. C. Conceição et al., “Classification of breast tumor Eng., vol. 495, no. 1, 2019, doi: 10.1088/1757-
models with a prototype microwave imaging system,” 899X/495/1/012033.
Med. Phys., vol. 47, no. 4, pp. 1860–1870, Apr. 2020, [31]. V. Chaurasia and S. Pal, “Applications of Machine
doi: 10.1002/mp.14064. Learning Techniques to Predict Diagnostic Breast
[20]. D. Muduli, R. Dash, and B. Majhi, “Automated breast Cancer,” SN Comput. Sci., vol. 1, no. 5, 2020, doi:
cancer detection in digital mammograms: A moth 10.1007/s42979-020-00296-8.
flame optimization based ELM approach,” Biomed. [32]. Y. Khourdifi and M. Bahaj, “Applying best machine
Signal Process. Control, vol. 59, May 2020, doi: learning algorithms for breast cancer prediction and
10.1016/j.bspc.2020.101912. classification,” 2018 Int. Conf. Electron. Control.
[21]. Z. Huang and D. Chen, “A Breast Cancer Diagnosis Optim. Comput. Sci. ICECOCS 2018, pp. 1–5, 2019,
Method Based on VIM Feature Selection and doi: 10.1109/ICECOCS.2018.8610632.
Hierarchical Clustering Random Forest Algorithm,”
IEEE Access, vol. 10, pp. 3284–3293, 2022, doi:
10.1109/ACCESS.2021.3139595.

IJISRT24OCT1557 www.ijisrt.com 1499


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

[33]. H. Asri, H. Mousannif, H. Al Moatassime, and T. [44]. B. H. Aymen Fathalla Alhasadi, “PREDICTING
Noel, “Using Machine Learning Algorithms for Breast BREAST CANCER BY USING ARTIFICIAL
Cancer Risk Prediction and Diagnosis,” Procedia NEURAL NETWORK A MASTER’S THESIS,”
Comput. Sci., vol. 83, no. Fams, pp. 1064–1069, 2016.
2016, doi: 10.1016/j.procs.2016.04.224. [45]. M. Kürşad, “OBSTRÜKTİF UYKU APNE TEŞHİSİ
[34]. E. A. Bayrak, P. Kirci, and T. Ensari, “Comparison of İÇİN MAKİNE ÖĞRENMESİ TABANLI YENİ BİR
machine learning methods for breast cancer YÖNTEM GELİŞTİRİLMESİ DOKTORA TEZİ.”
diagnosis,” 2019 Sci. Meet. Electr. Biomed. Eng. [46]. Y. Wang, Y. M. Chu, A. Thaljaoui, Y. A. Khan, W.
Comput. Sci. EBBT 2019, pp. 4–6, 2019, doi: Chammam, and S. Z. Abbas, “A multi-feature hybrid
10.1109/EBBT.2019.8741990. classification data mining technique for human-
[35]. Y. Shinde, A. Kenchappagol, and S. Mishra, emotion,” BioData Min., vol. 14, no. 1, Dec. 2021,
“Comparative Study of Machine Learning Algorithms doi: 10.1186/s13040-021-00254-x.
for Breast Cancer Classification,” Smart Innov. Syst. [47]. S. Sharma, A. Aggarwal, and T. Choudhury, “Breast
Technol., vol. 286, pp. 545–554, 2022, doi: Cancer Detection Using Machine Learning
10.1007/978-981-16-9873-6_49. Algorithms,” Proc. Int. Conf. Comput. Tech. Electron.
[36]. “Breast Cancer.” Accessed: May 08, 2024. [Online]. Mech. Syst. CTEMS 2018, no. Ml, pp. 114–118,
Available: 2018, doi: 10.1109/CTEMS.2018.8769187.
https://www.kaggle.com/datasets/reihanenamdari/brea [48]. M. Kaya Keleş, “Breast cancer prediction and
st-cancer/data detection using data mining classification algorithms:
[37]. M. K. UÇAR, M. R. BOZKURT, and C. BİLGİN, A comparative study,” Teh. Vjesn., vol. 26, no. 1, pp.
“Elektrokardiyogram Sinyalinin Uyku / Uyanıklık 149–155, 2019, doi: 10.17559/TV-20180417102943.
Evreleri için İstatistiksel Olarak İncelenmesi,” [49]. S. Bharati, M. A. Rahman, and P. Podder, “Breast
Süleyman Demirel Üniversitesi Fen Bilim. Enstitüsü cancer prediction applying different classification
Derg., vol. 24, no. 2, pp. 502–507, Aug. 2020, doi: algorithm with comparative analysis using WEKA,”
10.19113/sdufenbed.555651. 4th Int. Conf. Electr. Eng. Inf. Commun. Technol.
[38]. E. Kartal, “S ı n ı fland ı rmaya Dayal ı Makine Ö ğ iCEEiCT 2018, pp. 581–584, 2019, doi:
renmesi Teknikleri ve Kardiyolojik Risk De ğ 10.1109/CEEICT.2018.8628084.
erlendirmesine İ li ş kin Bir Uygulama,” no. January, [50]. A. Bharat, N. Pooja, and R. A. Reddy, “Using
2015. Machine Learning algorithms for breast cancer risk
[39]. M. K. UÇAR, “Eta Correlation Coefficient Based prediction and diagnosis,” 2018 IEEE 3rd Int. Conf.
Feature Selection Algorithm for Machine Learning: E- Circuits, Control. Commun. Comput. I4C 2018, no. x,
Score Feature Selection Algorithm,” J. Intell. Syst. pp. 1–4, 2018, doi: 10.1109/CIMCA.2018.8739696.
Theory Appl., vol. 2, no. 1, pp. 7–12, 2019, doi: [51]. N. Fatima, L. Liu, S. Hong, and H. Ahmed,
10.38016/jista.498799. “Prediction of Breast Cancer, Comparative Review of
[40]. “‘Ensemble Karar Ağaçları-Ensemble Decision Trees Machine Learning Techniques, and Their Analysis,”
(DT) / Regresyon Algoritması’ by İLYAS BERK IEEE Access, vol. 8, pp. 150360–150376, 2020, doi:
FIRSAT on Prezi.” Accessed: Mar. 28, 2024. 10.1109/ACCESS.2020.3016715.
[Online]. Available: [52]. M. Abdar et al., “A new nested ensemble technique
https://prezi.com/p/vzfhtellvaz4/ensemble-karar- for automated diagnosis of breast cancer,” Pattern
agaclar-ensemble-decision-trees-dt-regresyon- Recognit. Lett., vol. 132, pp. 123–131, Apr. 2020, doi:
algoritmas/ 10.1016/j.patrec.2018.11.004.
[41]. “Makine Öğrenimi Bölüm-5 (Karar Ağaçları).” [53]. A. LG and E. AT, “Using Three Machine Learning
Accessed: Mar. 27, 2024. [Online]. Available: Techniques for Predicting Breast Cancer Recurrence,”
https://www.linkedin.com/pulse/makine-öğrenimi- J. Heal. Med. Informatics, vol. 04, no. 02, 2013, doi:
bölüm-5-karar-ağaçları-eyüp-kaan- 10.4172/2157-7420.1000124.
ülgen/?originalSubdomain=tr [54]. S. K. Maliha, R. R. Ema, S. K. Ghosh, H. Ahmed, M.
[42]. “KNN (K-En Yakin Komsu). KNN algoritması, son R. J. Mollick, and T. Islam, “Cancer Disease
derece basit… | by ABDULLAH ATCILI | Machine Prediction Using Naive Bayes,K-Nearest Neighbor
Learning Turkiye | Medium.” Accessed: Mar. 28, and J48 algorithm,” 2019 10th Int. Conf. Comput.
2024. [Online]. Available: Commun. Netw. Technol. ICCCNT 2019, pp. 1–7,
https://medium.com/machine-learning-türkiye/knn-k- 2019, doi: 10.1109/ICCCNT45670.2019.8944686.
en-yakın-komşu-7a037f056116 [55]. M. H. Memon, J. P. Li, A. U. Haq, M. H. Memon, W.
[43]. “k-en Yakın Komşu Algoritması ve Bir Uygulama Zhou, and R. Lacuesta, “Breast Cancer Detection in
(Kredi Riskini Sınıflandırma) - PDF Free Download.” the IOT Health Environment Using Modified
Accessed: Mar. 27, 2024. [Online]. Available: Recursive Feature Selection,” Wirel. Commun. Mob.
https://docplayer.biz.tr/15448102-K-en-yakin-komsu- Comput., vol. 2019, 2019, doi:
algoritmasi-ve-bir-uygulama-kredi-riskini- 10.1155/2019/5176705.
siniflandirma.html

IJISRT24OCT1557 www.ijisrt.com 1500


Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1557

[56]. A. A. Said, L. A. Abd-Elmegid, S. Kholeif, and A. A.


Gaber, “Classification based on clustering model for
predicting main outcomes of breast cancer using
Hyper-Parameters Optimization,” Int. J. Adv.
Comput. Sci. Appl., vol. 9, no. 12, pp. 268–273, 2018,
doi: 10.14569/IJACSA.2018.091239.
[57]. P. Israni, “Breast cancer diagnosis (BCD) model using
machine learning,” Int. J. Innov. Technol. Explor.
Eng., vol. 8, no. 10, pp. 4456–4463, 2019, doi:
10.35940/ijitee.J9973.0881019.
[58]. S. N. Singh and S. Thakral, “Using data mining tools
for breast cancer prediction and analysis,” 2018 4th
Int. Conf. Comput. Commun. Autom. ICCCA 2018,
pp. 1–4, 2018, doi: 10.1109/CCAA.2018.8777713.
[59]. A. Al Bataineh, “A comparative analysis of nonlinear
machine learning algorithms for breast cancer
detection,” Int. J. Mach. Learn. Comput., vol. 9, no. 3,
pp. 248–254, Jun. 2019, doi:
10.18178/ijmlc.2019.9.3.794.

IJISRT24OCT1557 www.ijisrt.com 1501

You might also like