0% found this document useful (0 votes)

204 views10 pages

A Review of Multi-Class Classification Algorithms

The document provides an overview of multi-class classification algorithms. It discusses how classification is used to predict group membership for data instances. The paper examines current practices and problems of multi-class classification. It emphasizes summarizing major advanced classification approaches and techniques used to improve accuracy for multi-class problems in different datasets. Classification can be binary, predicting one of two classes, or multi-class, predicting one of three or more classes.

Uploaded by

Samuel Asmelash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

204 views10 pages

A Review of Multi-Class Classification Algorithms

Uploaded by

Samuel Asmelash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

International Journal of Pure and Applied Mathematics

Volume 118 No. 14 2018, 17-26

ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)
url: http://www.ijpam.eu
Special Issue
ijpam.eu

A REVIEW OF MULTI-CLASS CLASSIFICATION ALGORITHMS

Chaitra P.C1, Dr.R. Saravana Kumar2

1,2
Department of Computer Science and Engineerring
1,2
Dayananda Sagar Academy of Technology and Management, Bengaluru, India
[email protected], [email protected]

Abstract: Classification is one of the crucial tasks of data engineering, medicine, crime analysis, expert prediction,
mining, and many machine learning algorithms are web mining and mobile computing besides others utilize
inherently designed for binary decision problems. data mining
Classification is a complex process that may be affected
by many factors. This paper examines current practices, Data mining involves six common classes of tasks:
problems, and prospects of multi-class classification. In  Anomaly detection (outlier/change/deviation
several application domains such as biology, computer detection) – The identification of unusual data
vision, social network analysis and information retrieval, records, that might be interesting or data errors that
multi-class classification problems arise in which data require further investigation.
instances not simply belong to one particular class, but  Association rule learning (dependency modelling) –
exhibit a partial membership to several classes. The Searches for relationships between variables. For
emphasis is placed on the summarization of major example,a supermarket might gather data on
advanced classification approaches and the techniques customer purchasing habits. Using association rule
used for improving classification accuracy in multi class learning, the supermarket can determine which
classification for different datasets. products are frequently bought together and use this
information for marketing purposes. This is
Keywords: Data Mining, Multi class Classification sometimes referred to as market basket analysis.
 Clustering – is the task of discovering groups and
1. Introduction
structures in the data that are in some way or another
"similar", without using known structures in the
Data mining is the computing process of discovering
data.
patterns in large data sets involving methods at the
intersection of machine learning, statistics, and database  Classification – is the task of generalizing known
systems. It is an essential process where intelligent structure to apply to new data. For example, an e-
methods are applied to extract data patterns. It is an mail program might attempt to classify an e-mail as
interdisciplinary subfield of computer science. The "legitimate" or as "spam".
overall goal of the data mining process is to extract  Regression – attempts to find a function which
information from a data set and transform it into an models the data with the least error that is, for
understandable structure for further use. estimating the relationships among data or datasets.
 Summarization – providing a more compact
Data mining is a collection of techniques for efficient representation of the data set, including
automated discovery of previously unknown, valid, visualization and report generation.
novel, useful and understandable patterns in large
databases. Conventionally, the information that is mined In this paper we are concentrating on classification of
is denoted as a model of the semantic structure of the data. Classification is one of the fundamental and very
datasets. The model might be utilized for prediction and important task of data mining and machine learning field.
categorization of new data. In recent years the sizes of Databases are rich with hidden information, which can
databases has increased rapidly. This has lead to a be used for intelligent decision making. Classification
growing interest in the development of tools capable in and prediction are two forms of data analysis that can be
the automatic extraction of knowledge from data. The used to extract models describing important data classes
term Data Mining or Knowledge Discovery in databases or to predict future data trends. Classification is a data
has been adopted for a field of research dealing with the mining (machine learning) technique used to predict
automatic discovery of implicit information or group membership for data instances. Machine learning
knowledge within databases [16]. Diverse fields such as refers to a system that has the capability to automatically
marketing, customer relationship management,

17
International Journal of Pure and Applied Mathematics Special Issue

learn knowledge from experience and other ways Supervised learning problems can be further grouped into
Classification predicts categorical labels whereas regression and classification problems.
prediction models continuous valued functions.
Classification is the task of generalizing known structure  Classification: A classification problem is when
to apply to new data while clustering is the task of the output variable is a category, such as “red” or
discovering groups and structures in the data that are in “blue” or “disease” and “no disease”.
some way or another similar, without using known  Regression: A regression problem is when the
structures in the data. output.

In machine learning, the problem of classification is variable is a real value, such as “dollars” or “weight”.
encountered in various areas, such as medicine to Some popular examples of supervised machine learning
algorithms are:
identify a disease of a patient, or industry to decide
whether a defect has appeared or not, or to decide the
temperature is low, middle or high.  Linear regression for regression problems.
 Random forest for classification and regression
Classification is divided into two categories. problems.
 Support vector machines for classification
problems.

Unsupervised:

All data is unlabeled and the algorithms learn to inherent

structure from the input data. Unsupervised learning is
where you only have input data (X) and no corresponding
Binary Classifier and multi class classifier output variables.
Binary or binomial classification is the task of classifying Unsupervised learning problems can be further grouped
the elements of a given set into two groups (predicting into clustering and association problems.
which group each one belongs to) on the basis of a
classification rule.  Clustering: A clustering problem is where you
want to discover the inherent groupings in the
Multiclass or multinomial classification is the problem of data, such as grouping customers by purchasing
classifying instances into one of three or more classes. behaviour.
(Classifying instances into one of the two classes is called
 Association: An association rule learning
binary classification.) problem is where you want to discover rules that
describe large portions of your data, such as
Classification also take place in three ways people that buy X also tend to buy Y.
Some popular examples of unsupervised learning
algorithms are:

 k-means for clustering problems.

 Apriori algorithm for association rule learning
Supervised: problems.

Semi-supervised:
All data is labeled and the algorithms learn to predict the
output from the input data.
Some data is labeled but most of it is unlabeled and a
mixture of supervised and unsupervised techniques can
Supervised learning is where you have input variables (x)
be used.
and an output variable (Y) and you use an algorithm to
learn the mapping function from the input to the output.
Semi supervised learning refers to the use of both labeled
Y = f(X)
and unlabeled data for training. Many machine-learning

18
International Journal of Pure and Applied Mathematics Special Issue

researchers have found that unlabeled data, when used in Extension from binary
conjunction with a small amount of labeled data, can
produce considerable improvement in learning accuracy. Strategies of extending the existing binary classifiers to
solve multi-class classification problems. Several
Labelled instances however are often difficult, expensive,
algorithms have been developed based on
or time consuming to obtain, as they require the efforts of
neural networks, decision trees, neighbours, naive
experienced human annotators. Meanwhile unlabeled Bayes, support vector machines and extreme learning
data may be relatively easy to collect, but there has been machines to address multi-class classification problems.
few ways to use them. Semi-supervised learning
addresses this problem by using large amount of 2.1 Hierarchical classification
unlabeled data, together with the labelled data, to build
better classifiers. Because semi-supervised learning Hierarchical classification methods differ in a number of
requires less human effort and gives higher accuracy. criteria. The first criterion is the type of hierarchical
structure used. This structure is based on the problem
Some popular examples of semi-supervised methods are- structure and it typically is either a tree or a DAG.
EM with generative mixture models, Self-training, co-
training, transductive support vector machines and graph- We can do the same thing with binary classifiers: If we
based methods. have a large number of classes, we can divide them into
two sets of classes, say A and B. Then we can divide the
2. Methods To Multiclass Classification classes in A into two smaller sets of classes, divide B into
two smaller sets of classes and so on. To run our multi-
There are three groups of methods to solve the multiclass class classification, we would first train a binary classifier
classification problems. to determine whether a new data point is in some class in
A, or in some class in B. We then train a second binary
1. Extended from binary case classifier to determine which of the two subsets of A a
2. Converting the multiclass classification problem point is in, and a third classifier to determine which of the
into several binary classification problems subsets of B a point is in. We continue all the way down,
3. Hierarchical classification methods. until we get to classifiers that distinguish individual
classes. This is called hierarchical classification because
the different steps in the scheme form a sort of hierarchy
from the first question (the CEO) to the second level
questions (the vice-presidents) and down to the final
questions (the mailroom clerks) that distinguish
individual classes.

The tricky thing is that how you choose to construct the

hierarchy can have a big impact on how effective the final
classifier is.
Hierarchical classification

Hierarchical classification tackles the multi-class

classification problem by dividing the output space i.e.
into a tree. Each parent node is divided into multiple child
nodes and the process is continued until each child node
represents only one class. Several methods have been
proposed based on hierarchical classification -Binary
Hierarchical Classifier, Divide-By-2.

Transformation to binary
Reducing the problem of multiclass classification to
multiple binary classification problems. It can be
categorized into One vs Rest and One vs One.

19
International Journal of Pure and Applied Mathematics Special Issue

Advantages binary classifiers. The output from each classifier in the

form of a class label is obtained. The class label that occur
 Simplicity; the most is assigned to that point in the samples. The
 Easy to understand number of classifiers created by this method is generally
 there is a clear reporting structure much larger than the previous method. However, the
number of training data vectors required for each
Disadvantages classifier is much smaller. This method constructs k (k-
1)/2 classifiers where each one is trained on data from two
 Depending on the problem at hand, can create a classes. For training data from the ith and the jth classes,
very complex set of cascade of classifiers, which we solve the following binary classification problem.
in turn leads to a complex classification model; When testing a new example, a voting is performed
 Misclassification at a given class node is among the classifiers and the class with the maximum
propagated downwards to all its descendant number of votes wins.
classes;
The main disadvantage of this method is
2.2 Transformation To Binary
the increase in the number of classifiers as
2.2.1 One against All Approach (OVA) the number of class increases
Suppose the dataset is to be classified into K classes. Advantage
Therefore, K binary SVM classifiers may be created
where each classifier is trained to distinguish one class
its gives better results than the one against
from the remaining K-1 classes. For this approach, we
all approach.
require N = K binary classifiers, where the kth classifier
is trained with positive examples belonging to class k and 2.3 Extended From Binary Case
negative examples belonging to the other K − 1 classes. 2.3.1 Neural Networks

During the testing, samples are classified by finding An Artificial Neural Network (ANN) is an information
margin from the linear separating hyperplane. The final processing paradigm that is inspired by biological nervous
output is the class that corresponds to the SVM with the systems. It is composed of a large number of highly
largest margin. However, if the outputs corresponding to interconnected processing elements called neurons. An
two or more classes are very close to each other, those ANN is configured for a specific application, such as
points are labeled as unclassified. This multiclass method pattern recognition or data classification
has an advantage that the number of binary classifiers to
construct equals the number of classes. However, there Ability to derive meaning from complicated or imprecise
are some drawbacks. First, data extract patterns and detect trends that are too
complex to be noticed by either humans or other
 During the training phase, the memory computer techniques Adaptive learning Real Time
requirement is very high and amounts to at the Operation. Neural networks are commonly used for
square of the total number of training samples.
This may cause problems for large training data  classification problems and
sets and may lead to computer memory problems.  regression problems.
 Second, suppose there are K classes and each has
an equal number of training samples. During the A neuron in an artificial neural network is
training phase, the ratio of training samples of one
class to rest of the classes will be 1: (K −1). This 1. A set of input values (xi) and associated weights
(wi).
ratio, therefore, shows that training sample sizes
2. A function (g) that sums the weights and maps the
will be unbalanced.
results to an output (y).
2.2.2 One against One Approach (OAO)
Neurons are organized into layers: input, hidden and
output. The input layer is composed not of full neurons,
In this method, SVM classifiers for all possible pairs of
but rather consists simply of the record's values that are
classes are created. Therefore, for K classes, there will be

20
International Journal of Pure and Applied Mathematics Special Issue

inputs to the next layer of neurons. The next layer is the 3. A neural network learns and does not need to be
hidden layer. Several hidden layers can exist in one neural reprogrammed.
network. The final layer is the output layer, where there is 4. It can be implemented in any application and
one node for each class. A single sweep forward through without any problem.
the network results in the assignment of a value to each 5. High accuracy and noise tolerance.
output node, and the record is assigned to the class node
with the highest value. Disadvantages:
1. The neural network needs training to operate.
2. Requires high processing time for large neural
networks.
3. Lack of transparency
4. Learning time is long
5. Defining classification rule is difficult

2.3.2 Logistic regression

Multinomial logistic regression is a classification method
that generalizes logistic regression to multiclass
A key feature of neural networks is an iterative learning problems, i.e. with more than two possible discrete
process in which records (rows) are presented to the outcomes. That is, it is a model that is used to predict the
network one at a time, and the weights associated with the probabilities of the different possible outcomes of a
input values are adjusted each time. After all cases are categorically distributed dependent variable, given a set
presented, the process is often repeated. During this of independent variables.
learning phase, the network trains by adjusting the
weights to predict the correct class label of input samples. Multinomial logistic regression is a particular solution to
Advantages of neural networks include their high the classification problem that assumes that a linear
tolerance to noisy data, as well as their ability to classify combination of the observed features and some problem-
patterns on which they have not been trained. The most specific parameters can be used to determine the
popular neural network algorithm is the back-propagation probability of each particular outcome of the dependent
algorithm proposed in the 1980s. variable. The best values of the parameters for a given
problem are usually determined from some training data.
Once a network has been structured for a particular The difference between the multinomial logit model and
application, that network is ready to be trained. To start numerous other methods, models, algorithms, etc. with
this process, the initial weights (described in the next the same basic setup (the perceptron algorithm, support
section) are chosen randomly. Then the training (learning) vector machines, linear discriminant analysis, etc.) is the
begins. procedure for determining (training) the optimal
weights/coefficients and the way that the score is
The network processes the records in the Training Set one interpreted. In particular, in the multinomial logit model,
at a time, using the weights and functions in the hidden the score can directly be converted to a probability
layers, then compares the resulting outputs against the value, indicating the probability of observation i
desired outputs. Errors are then propagated back through choosing outcome k given the measured characteristics of
the system, causing the system to adjust the weights for the observation.
application to the next record. This process occurs
repeatedly as the weights are tweaked. During the training Advantages:
of a network, the same set of data is processed many times 1. Smooth function
as the connection weights are continually refined. 2. Get probability estimates
3. It’s simple and fast
Advantages 4. computationally inexpensive
1. A neural network can perform tasks that a linear
program cannot.
2. When an element of the neural network fails, it Drawbacks:
can continue without any problem by their 1. Limited expressive power
parallel nature. 2. Fundamentally a binary classifier

21
International Journal of Pure and Applied Mathematics Special Issue

3. Hard to make incremental 1. They are computationally simply to understand

4. it is prone to underfitting and may have low and interpret,
accuracy. 2. It can handle both numerical and categorical
data
2.3.3 Multiclass Perceptron 3. Performs well on large data in a short time.
4. Easy to understand
The perceptron is an algorithm for supervised learning of 5. Easy to generate rules
binary classifiers . It is a type of linear classifier, i.e. a
classification algorithm that makes its predictions based
Disadvantages
on a linear predictor function combining a set of weights
with the feature vector. The algorithm allows for online 1. It can create over-complex trees that do not
generalize the data well.
learning, in that it processes elements in the training set
one at a time. Like most other techniques for training 2. For data including categorical variables with
linear classifiers, the perceptron generalizes naturally to different number of levels, information gain in
multiclass classification. decision trees is biased in favour of those
attributes with more levels. [6]
Advantages: 3. Calculations can get very complex, particularly if
many values are uncertain and/or if many
1. Extremely simple updates (no gradient to calculate) outcomes are linked
2. No need to have all the data in memory (some point 4. May suffer from overfitting
stay classified correctly after a while)
2.3.5 K-Nearest Neighbor (kNN)
Drawbacks:
k-NN is a type of instance-based learning, or lazy
1. If the data is not separable decrease a slowly. learning, where the function is only approximated locally
and all computation is deferred until classification. In
2.3.4 Decision Trees pattern recognition, the k-nearest neighbors algorithm (k-
NN) is a non-parametric method used for classification
Decision tree builds classification or regression models in and regression. In both cases, the input consists of the k
the form of a tree structure. It breaks down a dataset into closest training examples in the feature space.
smaller and smaller subsets while at the same time an
associated decision tree is incrementally developed. The The principle behind nearest neighbor methods is to find
final result is a tree with decision nodes and leaf nodes. A a predefined number of training samples closest in
decision node has two or more branches. Leaf node distance to the new point, and predict the label from these.
represents a classification or decision. The topmost The number of samples can be a user-defined constant or
decision node in a tree which corresponds to the best vary based on the local density of points (radius-based
predictor called root node. Decision trees can handle both neighbor learning). The distance can, in general, be any
categorical and numerical data. metric measure: standard Euclidean distance is the most
common choice.
Problem: Overfitting split and classify very well the
training set, but we may do worse in terms of the Advantage
generalization error Solutions to the overfitting problem: 1. Simple technique that is easily implemented
2. Building model is cheap
Solution 1. – Prune branches of the tree built in the first 3. Extremely flexible classification scheme
phase – Use validation set to test for the overfit. 4. Robust in terms of not requiring the categories
to be linearly separable.
Solution 2. – Test for the overfit in the tree building phase
– Stop building the tree when performance on the Disadvantage
validation set deteriorates 1. High computational cost
2. Classification time is long’
3. Difficult to find optimal value of k
4. Classifying unknown records is relatively
Advantages
expensive

22
International Journal of Pure and Applied Mathematics Special Issue

5. Accuracy is severely degraded by noisy or dimensional space and uses almost all attributes. It
irrelevant function separates the space in a single pass to generate flat and
linear partitions. Divide the 2 categories by a clear gap
2.3.6 Naïve Bayes Classifier that should be as wide as possible. Do this partitioning by
a plane called hyperplane.
The Naive Bayes Classifier technique is based on the so-
called Bayesian theorem and is particularly suited when An SVM creates hyperplanes that have the largest margin
the dimensionality of the inputs is high. Despite its in a high-dimensional space to separate given data into
simplicity, Naive Bayes can often outperform more classes. The margin between the 2 classes represents the
sophisticated classification methods. The Bayesian longest distance between closest data points of those
classification is used as a probabilistic learning method classes. The larger the margin, the lower is the
(Naive Bayes text classification). Naive Bayes classifiers generalization error of the classifier. After training map
are among the most successful known algorithms for the new data to same space to predict which category they
learning to classify text documents. belong to. Categorize the new data into different
partitions and achieve it by training data.
It predicts membership probabilities for each class such
as the probability that given record or data point belongs Advantage
to a particular class. The class with the highest probability 1. Of all the available classifiers, SVM
is considered as the most likely class. This is also known provides the largest flexibility.
as Maximum A Posteriori (MAP).
2. High accuracy, nice theoretical guarantees
regarding overfitting, and with an
Naive Bayes classifier assumes that all the features are
appropriate kernel they can work well even
unrelated to each other. Presence or absence of a feature
does not influence the presence or absence of any other if you're data isn't linearly separable in the
feature. base feature space.
3. SVMs are like probabilistic approaches but
Advantage do not consider dependencies among
1. Easy to implement attributes.
2. Good results obtained in most of the cases Disadvantage
3. It only requires a small amount of training data
to estimate the parameters 1. Picking/finding the right kernel can be a challenge
4. Bayesian classifiers have also exhibited high 2. Results/output are incomprehensible
accuracy and speed when applied to large 3. No standardized way for dealing with multi-class
database. problems; fundamentally a binary classifier

Disadvantage 4. Comparison
1. Assumption, class conditional independence, 4.1 Different algorithm that can extended from
therefore loss of accuracy binary
2. Practically, dependencies among variables
cannot be modelled by naïve Bayesian classifier
3. Conditional independence assumption is
violated by real world data perform very poorly
when features are highly correlated.

2.3.7 Support Vector machine

SVM stands for Support Vector Machine. It is a machine
learning approach used for classification and regression
analysis. It depends on supervised learning models and
trained by learning algorithms. They analyze the large
amount of data to identify patterns from them.
An SVM generates parallel partitions by generating two
parallel lines. For each category of data in a high-

23
International Journal of Pure and Applied Mathematics Special Issue

4.2 Different algorithm that can Transformation to

binary
[3] Ravindra Changala, Annapurna Gummadi, G
Yedukondalu, UNPG Raju, Classification by
Decision Tree Induction Algorithm to Learn
Decision Trees from the class-Labeled Training
Tuples, International Journal of Advanced
Research in Computer Science and Software
Engineering, Volume 2, Issue 4, April 2012.

5. Conclusion [4] Yun-lei Cai, Duo Ji ,Dong-feng Cai,A KNN

Research Paper Classification Method Based on
Shared Nearest Neighbor, Natural Language Irina
This paper focuses on different supervised classification
Rish. An empirical study of the naive bayes
technique for multiclass classification. It explains how
classifier. In IJCAI Workshop on Empirical
binary classification methods can be extended to solve
Methods in Artificial Intelligence, 2001.
multiclass problem and explains how multiclass problem
can be reduced to multiple binary class problem. In this
[5] Simon Lacoste-Julien “Multi-Class and Structured
Decision Trees are fast to train and easy to evaluate and
Classification” Machine Learning Workshop
interrupt. It also explains how classes can be arranged in
8/24/07
a tree, usually a binary tree, and how to utilize a number
of binary classifiers at the nodes of the tree till a leaf node [6] Corinna Cortes and Vladimir Vapnik. Support-
is reached. Support vector machine gives good accuracy, vector networks. Machine Learning, pages 273–
power of flexibility from kernels. Neural networks are 297, 1995.
slow to converge and hard to set parameters but if done
with care it works wells. Bayesian classifiers are easy to [7] R. Tibshirani, et al., “Diagnosis of multiple caner
understand. Decision trees and rule based algorithms are types by shrunken centroids of gene expression”,
good because you can understand the model that was
Proc. Natl Acad. Sci.USA, 2002, 99:6567-6572.
built for classifying, unlike with neural networks.
Support Vector Machines work very well in many [8] Tomer Hertz Tomboy, Aharon Bar Hillel and
circumstances and perform very good with large Aharonbh Daphna Weinshall, “Learning a Kernel
amounts of data. Naive Bayes mechanism is very simple Function for Classification with Small Training
to understand, it has also a high performance and is also Samples,” School of computer Science and
easy to implement. The one-versus-all method yields the Engineering, The Center for Neural Computation,
best computational efficiency, while the one-versus-one The Hebrew University of Jerusalem, Jerusalem,
methods are preferred in terms of predictive Israel.
performance, especially when the observed class
[9] J. Weston, C. Watkins, Multi-class support vector
memberships are heavily unbalanced.
machines, Technical Report CSD-TR-98-04,
Department of Computer Science, Royal
6. References
Holloway, University of London, Egham, UK,
1998.
[1] R. K. Agrawal, “Data Mining Techniques for
Malware Detection” School of Computer and [10] Erin Allwein, Robert Shapire, and Yoram Singer.
Systems Sciences Jawaharlal Nehru Reducing multiclass to binary: A unifying
UniversityOliver Sutton, Introduction to k approach for margin classifiers. Journal of Machine
Nearest Neighbour Classification and mLearning Research, pages 113–141, 2000.
Condensed Nearest Neighbour Data [11] R. Rifkin, A. Klautau, In defense of one-versus-all
Reduction. classification, Journal of Machine Learning
Research 5 (2004) 101–143.
[2] Daniel Svozil, Vladimir KvasniEka, JiE Pospichal, [12] Chih-Wei Hsu and Chih-Jen Lin, “A Comparison
Introduction to multi-layer feed-forward neural of Methods for Multiclass Support Vector
networks, Chemometrics and Intelligent Machines,” IEEE Transactions on neural networks
Laboratory Systems 39 (1997) 43-62. vol.13, no. 2, march 2002.

24
International Journal of Pure and Applied Mathematics Special Issue

[13] T. G. Dietterich and G. Bakiri. Solving multiclass

learning problems via error correcting output
codes. Journal of Artificial Intelligence Research,
39:1–38, 1995.
[14] Volkan Vural and Jennifer G. Dy. A hierarchical
method for multiclass support vector machines. In
Proceedings of the twenty-first international
conference on Machine learning, pages 105–112,
2004.
[15] J. Demsar, Statistical comparisons of classifiers
over multiple datasets, Journal of Machine
Learning Research 7 (2006) 1–30.
[16] S. Kumar, J. Ghosh, M.M. Crawford, Hierarchical
fusion of multiple classifiers for hyperspectral data
analysis, Pattern Analysis& Applications, 5:210-
220, 2002.
[17] V. Vural, J.G. Dy, A hierarchical method for multi-
class support vector machines. In Proceedings of
the Twenty-First International Conference on
Machine Learning 105-112, 2004.
[18] Y. Guermeur, VC theory of large margin multi-
category classifiers, Journal of Machine Learning
Research 8 (2007) 2551–2594.
[19] Milos Hauskrecht CS 2750 Machine Learning
Lecture 13 “Multiclass classification Decision trees
“ 5329 Sennott Square.

[20] Mahesh Pal, Multiclass Approaches for Support

Vector Machine Based Land Cove Classification.
Lecturer, Department of Civil engineering National
Institute of Technology Kurukshetra, 136119,
Haryana (India), 2008.

25
26

Classification
No ratings yet
Classification
44 pages
Dav Unit 3
No ratings yet
Dav Unit 3
50 pages
Classification in Data Mining 12
No ratings yet
Classification in Data Mining 12
7 pages
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
Unit 10
No ratings yet
Unit 10
47 pages
Week001-Module (1) Merged
No ratings yet
Week001-Module (1) Merged
122 pages
Performance Tuning Basics 15 - AWR Report Analysis - Expert Oracle
No ratings yet
Performance Tuning Basics 15 - AWR Report Analysis - Expert Oracle
63 pages
Complete Bundle Business Data Communications Infrastructure Networking and Security 7th Edition Stallings
No ratings yet
Complete Bundle Business Data Communications Infrastructure Networking and Security 7th Edition Stallings
409 pages
BI Unit 3 Part 1
No ratings yet
BI Unit 3 Part 1
51 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Classification Basic Concept - Data Mining
No ratings yet
Classification Basic Concept - Data Mining
20 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
CSE225.7 Course Outline
No ratings yet
CSE225.7 Course Outline
3 pages
A Simple Framework For Building Predictive Models
No ratings yet
A Simple Framework For Building Predictive Models
18 pages
Revised Blueprint, Civil Engineering - 240116 - 131542
100% (1)
Revised Blueprint, Civil Engineering - 240116 - 131542
41 pages
Result of Civil Judge Class-II Online Prelims - 2019 (Phase-II) Alongwith Application Form
No ratings yet
Result of Civil Judge Class-II Online Prelims - 2019 (Phase-II) Alongwith Application Form
48 pages
District Educational Office:: Warangal Teachers Online Particulars
No ratings yet
District Educational Office:: Warangal Teachers Online Particulars
4 pages
NAMI NH - NAMI - Adult Guidebook2014 PDF
No ratings yet
NAMI NH - NAMI - Adult Guidebook2014 PDF
176 pages
Class10-Introduction To ML
No ratings yet
Class10-Introduction To ML
32 pages
Active DataGuard DML Redirection
No ratings yet
Active DataGuard DML Redirection
9 pages
Web Content Classification: A Survey: Prabhjot Kaur
No ratings yet
Web Content Classification: A Survey: Prabhjot Kaur
5 pages
ML Unit2
No ratings yet
ML Unit2
22 pages
Lecture 2 Unit 1
No ratings yet
Lecture 2 Unit 1
60 pages
How To Optimize A Data Guard Configuration
100% (1)
How To Optimize A Data Guard Configuration
12 pages
Metrics For Multi-Class Classification
No ratings yet
Metrics For Multi-Class Classification
17 pages
UNIT - II - Data Mining Essentials
No ratings yet
UNIT - II - Data Mining Essentials
20 pages
Unit 3 DM
No ratings yet
Unit 3 DM
34 pages
Academic Staff Positions - Plateau State University
100% (2)
Academic Staff Positions - Plateau State University
2 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
CH 4
No ratings yet
CH 4
8 pages
A Comparative Analysis of Machine Learning Algorithms For Classification Purpose
No ratings yet
A Comparative Analysis of Machine Learning Algorithms For Classification Purpose
10 pages
Clinical Ward Rotation Final Year MBBS (New File)
No ratings yet
Clinical Ward Rotation Final Year MBBS (New File)
1 page
Classification Techniquesin Machine Learning Applicationsand Issues
No ratings yet
Classification Techniquesin Machine Learning Applicationsand Issues
8 pages
Classification
No ratings yet
Classification
50 pages
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
No ratings yet
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
65 pages
How To Create Disk Storage With LVM in Linux - Part 1
No ratings yet
How To Create Disk Storage With LVM in Linux - Part 1
35 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
Module1 ML
No ratings yet
Module1 ML
13 pages
Comprehensive Guide To Multiclass Classification With Sklearn - Towards Data Science
No ratings yet
Comprehensive Guide To Multiclass Classification With Sklearn - Towards Data Science
19 pages
Data Mining Jntuh Cse R18
No ratings yet
Data Mining Jntuh Cse R18
20 pages
Overview Basics
No ratings yet
Overview Basics
16 pages
From 22.08.2011 To 30.06.2016 (OCTMP) Worked As District Consultant Under WR Department Govt. of Odisha
No ratings yet
From 22.08.2011 To 30.06.2016 (OCTMP) Worked As District Consultant Under WR Department Govt. of Odisha
3 pages
ML Final Print Upload
No ratings yet
ML Final Print Upload
10 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
Classification in Data Mining
No ratings yet
Classification in Data Mining
14 pages
4 - Data Analytics Using DM and ML Algorithms - 1
No ratings yet
4 - Data Analytics Using DM and ML Algorithms - 1
71 pages
Synopsis - Vinay Mohan
No ratings yet
Synopsis - Vinay Mohan
3 pages
Ijctt V48P126
No ratings yet
Ijctt V48P126
11 pages
PWSU - Professional Competencies - August 20202docx
No ratings yet
PWSU - Professional Competencies - August 20202docx
7 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
2023 - MEC1003F - Course Outline-1 PDF
No ratings yet
2023 - MEC1003F - Course Outline-1 PDF
3 pages
I. Classification: Department of Computer Science and Engineering Course Code: CD503 Course Name: Pattern Recognition
No ratings yet
I. Classification: Department of Computer Science and Engineering Course Code: CD503 Course Name: Pattern Recognition
4 pages
Unit 3
No ratings yet
Unit 3
33 pages
How To Write A Literature Review Leeds University
100% (1)
How To Write A Literature Review Leeds University
5 pages
U4 Clasification and Prediction
No ratings yet
U4 Clasification and Prediction
15 pages
Upgrade PDBs - Everything at Once (Full CDB Upgrade)
No ratings yet
Upgrade PDBs - Everything at Once (Full CDB Upgrade)
20 pages
Unit 1 Part 3
No ratings yet
Unit 1 Part 3
11 pages
Machine Learning 1.4.19
No ratings yet
Machine Learning 1.4.19
23 pages
Prof Ed Print
No ratings yet
Prof Ed Print
19 pages
An Overview of Machine Learning Classification Tec
No ratings yet
An Overview of Machine Learning Classification Tec
24 pages
How To Set Up Oracle GoldenGate Microservices 12.3
No ratings yet
How To Set Up Oracle GoldenGate Microservices 12.3
33 pages
Engl111 71761act1 BanguisMyraFritzie
No ratings yet
Engl111 71761act1 BanguisMyraFritzie
2 pages
Exporting A Keystore From ASM To A Target Host For Oracle TDE Provisioning
No ratings yet
Exporting A Keystore From ASM To A Target Host For Oracle TDE Provisioning
8 pages
Manish NTCC Presentation Sem 5
No ratings yet
Manish NTCC Presentation Sem 5
11 pages
Classification Analysis
No ratings yet
Classification Analysis
4 pages
Past Simple of Verb Be, Present Simple Vs Past Simple
No ratings yet
Past Simple of Verb Be, Present Simple Vs Past Simple
3 pages
"From DFD To Structure Chart": TCS2411 Software Engineering 1
No ratings yet
"From DFD To Structure Chart": TCS2411 Software Engineering 1
19 pages
Dipr 001 Course Outline - 2025
No ratings yet
Dipr 001 Course Outline - 2025
7 pages
A Case Study On Data Classification Approach Using K-Nearest Neighbor
No ratings yet
A Case Study On Data Classification Approach Using K-Nearest Neighbor
7 pages
Oracle Linux HugePages What It Is and What It Is Not Doc ID 361323
No ratings yet
Oracle Linux HugePages What It Is and What It Is Not Doc ID 361323
6 pages
How To Perform ORACLE - HOME Backup Doc ID 565017
No ratings yet
How To Perform ORACLE - HOME Backup Doc ID 565017
2 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
Classification: Unit-III
No ratings yet
Classification: Unit-III
90 pages
Language Testing Then and Now
No ratings yet
Language Testing Then and Now
20 pages
International Gastroenterology Conference 220324
No ratings yet
International Gastroenterology Conference 220324
6 pages
Detail Employee Information Report
No ratings yet
Detail Employee Information Report
3 pages
Data Warehouse and Mining Notes
No ratings yet
Data Warehouse and Mining Notes
12 pages
Dissertation Uni Erlangen Medizin
100% (2)
Dissertation Uni Erlangen Medizin
8 pages
Linux Volume Extend
No ratings yet
Linux Volume Extend
4 pages
Chapter 9
No ratings yet
Chapter 9
5 pages
Classification Techniques in Machine Learning: Applications and Issues
No ratings yet
Classification Techniques in Machine Learning: Applications and Issues
8 pages
DOC-20240412-WA0014. - 1713077993257 - Rahul Mahadev Ugade
No ratings yet
DOC-20240412-WA0014. - 1713077993257 - Rahul Mahadev Ugade
2 pages
International Journal of Advanced Trends in Computer Science and Engineering
No ratings yet
International Journal of Advanced Trends in Computer Science and Engineering
8 pages
Basic Concept of Classification (Data Mining)
No ratings yet
Basic Concept of Classification (Data Mining)
11 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
1.1 Data and Information Mining
No ratings yet
1.1 Data and Information Mining
24 pages
RACMA Approved Masters Programs - 2020
No ratings yet
RACMA Approved Masters Programs - 2020
1 page
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Survey of Classification Techniques in Data Mining: Open Access
No ratings yet
Survey of Classification Techniques in Data Mining: Open Access
10 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
Classification Algorithm in Data Mining: An
No ratings yet
Classification Algorithm in Data Mining: An
6 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
Web Content Classification: A Survey
No ratings yet
Web Content Classification: A Survey
5 pages
Understanding Data Mining
No ratings yet
Understanding Data Mining
21 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Metoprolol (Lopressor, Toprol-XL) Considerations For Use : Mechanism of Action Dosing
No ratings yet
Metoprolol (Lopressor, Toprol-XL) Considerations For Use : Mechanism of Action Dosing
1 page
Assistant Professor English Solved Papers 2nd Edition 2nd Edition Yct Expert Team Instant Download
100% (1)
Assistant Professor English Solved Papers 2nd Edition 2nd Edition Yct Expert Team Instant Download
91 pages
God-Like - A 500-Year History of AI - Extract On Wikipedia Bias
No ratings yet
God-Like - A 500-Year History of AI - Extract On Wikipedia Bias
4 pages
Statement of Purpose Auburn
No ratings yet
Statement of Purpose Auburn
2 pages
NSTP 02 Activity 4 Community Immersion
No ratings yet
NSTP 02 Activity 4 Community Immersion
1 page
Introduction To Data Mining Techniques: Dr. Rajni Jain
No ratings yet
Introduction To Data Mining Techniques: Dr. Rajni Jain
11 pages
Data Mining Techniques and Its Applications in Banking Section - Chitra and Subashini
No ratings yet
Data Mining Techniques and Its Applications in Banking Section - Chitra and Subashini
8 pages

A Review of Multi-Class Classification Algorithms

Uploaded by

A Review of Multi-Class Classification Algorithms

Uploaded by

International Journal of Pure and Applied Mathematics

Volume 118 No. 14 2018, 17-26

A REVIEW OF MULTI-CLASS CLASSIFICATION ALGORITHMS

Chaitra P.C1, Dr.R. Saravana Kumar2

All data is unlabeled and the algorithms learn to inherent

 k-means for clustering problems.

The tricky thing is that how you choose to construct the

Hierarchical classification tackles the multi-class

Advantages binary classifiers. The output from each classifier in the

2.3.2 Logistic regression

3. Hard to make incremental 1. They are computationally simply to understand

2.3.7 Support Vector machine

4.2 Different algorithm that can Transformation to

5. Conclusion [4] Yun-lei Cai, Duo Ji ,Dong-feng Cai,A KNN

[13] T. G. Dietterich and G. Bakiri. Solving multiclass

[20] Mahesh Pal, Multiclass Approaches for Support

You might also like