An Overview of Machine Learning Classification Tec
An Overview of Machine Learning Classification Tec
1051/bioconf/20249700133
ISCKU 2024
1 INTRODUCTION
Machine learning is an interdisciplinary field that draws its foundations from
statistical theory, computer science, and other related domains. Its meteoric rise in
* Corresponding author: [email protected]
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution
License 4.0 (https://creativecommons.org/licenses/by/4.0/).
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
2 MACHINE LEARNING
2
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
3
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
3 TYPES OF CLASSIFICATION
The field of classification encompasses four primary task types. These include
binary classification, multi-class classification, multi-label classification, and
imbalanced classification. Resources and references related to these categories can
be found in [10], [12], [16].
4
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
4.1 Zero R
Zero R or Zero Rule is the simplest classification method that relies solely on
the target data and disregards all other predictors. The Zero R classifier makes
predictions based on the majority category labels. Although Zero R lacks
predictability power, it is useful for establishing a baseline performance as a standard
for other classification methods [8], [18]. Zero R has advantage that it is provides
standard for other classification methods, and disadvantage is that it depends only on
target data.
4.2 One R
One R, also referred to as One Rule, is a simple classification algorithm that
generates one rule for each predictor in the data but is not highly accurate. It selects
the best predictor from a frequency table to predict the target, based on the smallest
total error using the one rule algorithm, and it is slightly less accurate than state-of-
the-art classification algorithms, but it can be useful for establishing a baseline
performance as a standard for comparison [19]. One R has several advantages in the
field of state-of-the-art classification, but it also has certain disadvantages.
5
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
6
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
7
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
8
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
9
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
10
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
11
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
12
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
context, the dependent variable is the variable of interest that we aim to predict, and
the independent variables are the factors that influence the outcome [16], [18]. LR
can also be generalized to model a categorical variable with more than two values,
known as the multinomial logistic regression. Logistic regression has the following
advantages and disadvantages as in Table 3.
Table 3 Advantages and disadvantages of LR [16], [22]
Advantages Disadvantages
Its advantages include simplicity of Logistic Regression has some limitations,
implementation, computational efficiency, including its inability to solve non-linear
and ease of regularization. No scaling is problems due to its linear decision surface,
required for input features. It is not affected and being prone to overfitting.
by small noise in the data and Additionally, this algorithm requires that
multicollinearity. As the output of Logistic all independent variables be identified to
Regression is a probability score, it is work well. Another major disadvantage of
required to specify customized performance the logistic regression algorithm is that it is
metrics so as to obtain a cutoff which can only suitable for binary predicted variables
be used to do the classification of the target. and assumes that the data is free of missing
Logistic regression is specifically designed values and that the predictors are
for classification problems, and it helps independent of each other.
understand how independent variables
influence the outcome of the dependent
variable. It is also the simplest algorithm
that does not require high computational
power and less prone to overfitting.
Additionally, the algorithm is precise and
can be updated easily to reflect new data.
LR is widely applied in various fields, such as risk identification for diseases,
word classification, weather prediction, and voting applications [16]. It is also used
for predicting the risk of developing a disease, cancer diagnosis, and engineering
applications such as predicting the probability of failure of a process, system, or
product [22].
Types of Logistic Regression
There are three different types of Logistic Regression algorithms:
Binary Logistic Regression: It has only two possible outcomes. For example,
yes or no.
Multinomial Logistic Regression: It has three or more nominal categories. For
example, cat, dog, elephant.
Ordinal Logistic Regression: It has three or more ordinal categories, ordinal
meaning that the categories will be in an order. For example, user ratings (1-5) [16].
13
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
14
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
15
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
16
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
Fig .14. The basic components of the artificial network two layers [36]
Neural Networks have the capability to handle multiple regression and
classification tasks simultaneously [21]. However, using ANN can be challenging
due to the time required for training on complex data, and their black box nature
where users cannot interfere with the final decision-making process [17]. The concept
of ANN was introduced by McCulloch and Pitts in 1943, with the aim of simulating
the functions and structure of living beings' nervous systems [38] as shown in Figure
15, serving as a base model for Rosenblat's Perceptron [35]
17
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
18
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
6 CONCLUSION
19
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
20
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
ACKNOWLEDGEMENT
I would like to thank the reviewers for providing useful suggestions, allowing for the improved
presentation of this paper. And I would also like to extend my thanks and gratitude to my
honorable supervisor for her continuous support and dedication to tirelessly providing
scientific advice to make this work a success.
References
21
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
22
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
23
BIO Web of Conferences 97, 00133 (2024) https://doi.org/10.1051/bioconf/20249700133
ISCKU 2024
24