0% found this document useful (0 votes)
204 views10 pages

A Review of Multi-Class Classification Algorithms

The document provides an overview of multi-class classification algorithms. It discusses how classification is used to predict group membership for data instances. The paper examines current practices and problems of multi-class classification. It emphasizes summarizing major advanced classification approaches and techniques used to improve accuracy for multi-class problems in different datasets. Classification can be binary, predicting one of two classes, or multi-class, predicting one of three or more classes.

Uploaded by

Samuel Asmelash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
204 views10 pages

A Review of Multi-Class Classification Algorithms

The document provides an overview of multi-class classification algorithms. It discusses how classification is used to predict group membership for data instances. The paper examines current practices and problems of multi-class classification. It emphasizes summarizing major advanced classification approaches and techniques used to improve accuracy for multi-class problems in different datasets. Classification can be binary, predicting one of two classes, or multi-class, predicting one of three or more classes.

Uploaded by

Samuel Asmelash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

International Journal of Pure and Applied Mathematics

Volume 118 No. 14 2018, 17-26


ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)
url: http://www.ijpam.eu
Special Issue
ijpam.eu

A REVIEW OF MULTI-CLASS CLASSIFICATION ALGORITHMS

Chaitra P.C1, Dr.R. Saravana Kumar2


1,2
Department of Computer Science and Engineerring
1,2
Dayananda Sagar Academy of Technology and Management, Bengaluru, India
[email protected], [email protected]

Abstract: Classification is one of the crucial tasks of data engineering, medicine, crime analysis, expert prediction,
mining, and many machine learning algorithms are web mining and mobile computing besides others utilize
inherently designed for binary decision problems. data mining
Classification is a complex process that may be affected
by many factors. This paper examines current practices, Data mining involves six common classes of tasks:
problems, and prospects of multi-class classification. In  Anomaly detection (outlier/change/deviation
several application domains such as biology, computer detection) – The identification of unusual data
vision, social network analysis and information retrieval, records, that might be interesting or data errors that
multi-class classification problems arise in which data require further investigation.
instances not simply belong to one particular class, but  Association rule learning (dependency modelling) –
exhibit a partial membership to several classes. The Searches for relationships between variables. For
emphasis is placed on the summarization of major example,a supermarket might gather data on
advanced classification approaches and the techniques customer purchasing habits. Using association rule
used for improving classification accuracy in multi class learning, the supermarket can determine which
classification for different datasets. products are frequently bought together and use this
information for marketing purposes. This is
Keywords: Data Mining, Multi class Classification sometimes referred to as market basket analysis.
 Clustering – is the task of discovering groups and
1. Introduction
structures in the data that are in some way or another
"similar", without using known structures in the
Data mining is the computing process of discovering
data.
patterns in large data sets involving methods at the
intersection of machine learning, statistics, and database  Classification – is the task of generalizing known
systems. It is an essential process where intelligent structure to apply to new data. For example, an e-
methods are applied to extract data patterns. It is an mail program might attempt to classify an e-mail as
interdisciplinary subfield of computer science. The "legitimate" or as "spam".
overall goal of the data mining process is to extract  Regression – attempts to find a function which
information from a data set and transform it into an models the data with the least error that is, for
understandable structure for further use. estimating the relationships among data or datasets.
 Summarization – providing a more compact
Data mining is a collection of techniques for efficient representation of the data set, including
automated discovery of previously unknown, valid, visualization and report generation.
novel, useful and understandable patterns in large
databases. Conventionally, the information that is mined In this paper we are concentrating on classification of
is denoted as a model of the semantic structure of the data. Classification is one of the fundamental and very
datasets. The model might be utilized for prediction and important task of data mining and machine learning field.
categorization of new data. In recent years the sizes of Databases are rich with hidden information, which can
databases has increased rapidly. This has lead to a be used for intelligent decision making. Classification
growing interest in the development of tools capable in and prediction are two forms of data analysis that can be
the automatic extraction of knowledge from data. The used to extract models describing important data classes
term Data Mining or Knowledge Discovery in databases or to predict future data trends. Classification is a data
has been adopted for a field of research dealing with the mining (machine learning) technique used to predict
automatic discovery of implicit information or group membership for data instances. Machine learning
knowledge within databases [16]. Diverse fields such as refers to a system that has the capability to automatically
marketing, customer relationship management,

17
International Journal of Pure and Applied Mathematics Special Issue

learn knowledge from experience and other ways Supervised learning problems can be further grouped into
Classification predicts categorical labels whereas regression and classification problems.
prediction models continuous valued functions.
Classification is the task of generalizing known structure  Classification: A classification problem is when
to apply to new data while clustering is the task of the output variable is a category, such as “red” or
discovering groups and structures in the data that are in “blue” or “disease” and “no disease”.
some way or another similar, without using known  Regression: A regression problem is when the
structures in the data. output.

In machine learning, the problem of classification is variable is a real value, such as “dollars” or “weight”.
encountered in various areas, such as medicine to Some popular examples of supervised machine learning
algorithms are:
identify a disease of a patient, or industry to decide
whether a defect has appeared or not, or to decide the
temperature is low, middle or high.  Linear regression for regression problems.
 Random forest for classification and regression
Classification is divided into two categories. problems.
 Support vector machines for classification
problems.

Unsupervised:

All data is unlabeled and the algorithms learn to inherent


structure from the input data. Unsupervised learning is
where you only have input data (X) and no corresponding
Binary Classifier and multi class classifier output variables.
Binary or binomial classification is the task of classifying Unsupervised learning problems can be further grouped
the elements of a given set into two groups (predicting into clustering and association problems.
which group each one belongs to) on the basis of a
classification rule.  Clustering: A clustering problem is where you
want to discover the inherent groupings in the
Multiclass or multinomial classification is the problem of data, such as grouping customers by purchasing
classifying instances into one of three or more classes. behaviour.
(Classifying instances into one of the two classes is called
 Association: An association rule learning
binary classification.) problem is where you want to discover rules that
describe large portions of your data, such as
Classification also take place in three ways people that buy X also tend to buy Y.
Some popular examples of unsupervised learning
algorithms are:

 k-means for clustering problems.


 Apriori algorithm for association rule learning
Supervised: problems.

Semi-supervised:
All data is labeled and the algorithms learn to predict the
output from the input data.
Some data is labeled but most of it is unlabeled and a
mixture of supervised and unsupervised techniques can
Supervised learning is where you have input variables (x)
be used.
and an output variable (Y) and you use an algorithm to
learn the mapping function from the input to the output.
Semi supervised learning refers to the use of both labeled
Y = f(X)
and unlabeled data for training. Many machine-learning

18
International Journal of Pure and Applied Mathematics Special Issue

researchers have found that unlabeled data, when used in Extension from binary
conjunction with a small amount of labeled data, can
produce considerable improvement in learning accuracy. Strategies of extending the existing binary classifiers to
solve multi-class classification problems. Several
Labelled instances however are often difficult, expensive,
algorithms have been developed based on
or time consuming to obtain, as they require the efforts of
neural networks, decision trees, neighbours, naive
experienced human annotators. Meanwhile unlabeled Bayes, support vector machines and extreme learning
data may be relatively easy to collect, but there has been machines to address multi-class classification problems.
few ways to use them. Semi-supervised learning
addresses this problem by using large amount of 2.1 Hierarchical classification
unlabeled data, together with the labelled data, to build
better classifiers. Because semi-supervised learning Hierarchical classification methods differ in a number of
requires less human effort and gives higher accuracy. criteria. The first criterion is the type of hierarchical
structure used. This structure is based on the problem
Some popular examples of semi-supervised methods are- structure and it typically is either a tree or a DAG.
EM with generative mixture models, Self-training, co-
training, transductive support vector machines and graph- We can do the same thing with binary classifiers: If we
based methods. have a large number of classes, we can divide them into
two sets of classes, say A and B. Then we can divide the
2. Methods To Multiclass Classification classes in A into two smaller sets of classes, divide B into
two smaller sets of classes and so on. To run our multi-
There are three groups of methods to solve the multiclass class classification, we would first train a binary classifier
classification problems. to determine whether a new data point is in some class in
A, or in some class in B. We then train a second binary
1. Extended from binary case classifier to determine which of the two subsets of A a
2. Converting the multiclass classification problem point is in, and a third classifier to determine which of the
into several binary classification problems subsets of B a point is in. We continue all the way down,
3. Hierarchical classification methods. until we get to classifiers that distinguish individual
classes. This is called hierarchical classification because
the different steps in the scheme form a sort of hierarchy
from the first question (the CEO) to the second level
questions (the vice-presidents) and down to the final
questions (the mailroom clerks) that distinguish
individual classes.

The tricky thing is that how you choose to construct the


hierarchy can have a big impact on how effective the final
classifier is.
Hierarchical classification

Hierarchical classification tackles the multi-class


classification problem by dividing the output space i.e.
into a tree. Each parent node is divided into multiple child
nodes and the process is continued until each child node
represents only one class. Several methods have been
proposed based on hierarchical classification -Binary
Hierarchical Classifier, Divide-By-2.

Transformation to binary
Reducing the problem of multiclass classification to
multiple binary classification problems. It can be
categorized into One vs Rest and One vs One.

19
International Journal of Pure and Applied Mathematics Special Issue

Advantages binary classifiers. The output from each classifier in the


form of a class label is obtained. The class label that occur
 Simplicity; the most is assigned to that point in the samples. The
 Easy to understand number of classifiers created by this method is generally
 there is a clear reporting structure much larger than the previous method. However, the
number of training data vectors required for each
Disadvantages classifier is much smaller. This method constructs k (k-
1)/2 classifiers where each one is trained on data from two
 Depending on the problem at hand, can create a classes. For training data from the ith and the jth classes,
very complex set of cascade of classifiers, which we solve the following binary classification problem.
in turn leads to a complex classification model; When testing a new example, a voting is performed
 Misclassification at a given class node is among the classifiers and the class with the maximum
propagated downwards to all its descendant number of votes wins.
classes;
The main disadvantage of this method is
2.2 Transformation To Binary
the increase in the number of classifiers as
2.2.1 One against All Approach (OVA) the number of class increases
Suppose the dataset is to be classified into K classes. Advantage
Therefore, K binary SVM classifiers may be created
where each classifier is trained to distinguish one class
its gives better results than the one against
from the remaining K-1 classes. For this approach, we
all approach.
require N = K binary classifiers, where the kth classifier
is trained with positive examples belonging to class k and 2.3 Extended From Binary Case
negative examples belonging to the other K − 1 classes. 2.3.1 Neural Networks

During the testing, samples are classified by finding An Artificial Neural Network (ANN) is an information
margin from the linear separating hyperplane. The final processing paradigm that is inspired by biological nervous
output is the class that corresponds to the SVM with the systems. It is composed of a large number of highly
largest margin. However, if the outputs corresponding to interconnected processing elements called neurons. An
two or more classes are very close to each other, those ANN is configured for a specific application, such as
points are labeled as unclassified. This multiclass method pattern recognition or data classification
has an advantage that the number of binary classifiers to
construct equals the number of classes. However, there Ability to derive meaning from complicated or imprecise
are some drawbacks. First, data extract patterns and detect trends that are too
complex to be noticed by either humans or other
 During the training phase, the memory computer techniques Adaptive learning Real Time
requirement is very high and amounts to at the Operation. Neural networks are commonly used for
square of the total number of training samples.
This may cause problems for large training data  classification problems and
sets and may lead to computer memory problems.  regression problems.
 Second, suppose there are K classes and each has
an equal number of training samples. During the A neuron in an artificial neural network is
training phase, the ratio of training samples of one
class to rest of the classes will be 1: (K −1). This 1. A set of input values (xi) and associated weights
(wi).
ratio, therefore, shows that training sample sizes
2. A function (g) that sums the weights and maps the
will be unbalanced.
results to an output (y).
2.2.2 One against One Approach (OAO)
Neurons are organized into layers: input, hidden and
output. The input layer is composed not of full neurons,
In this method, SVM classifiers for all possible pairs of
but rather consists simply of the record's values that are
classes are created. Therefore, for K classes, there will be

20
International Journal of Pure and Applied Mathematics Special Issue

inputs to the next layer of neurons. The next layer is the 3. A neural network learns and does not need to be
hidden layer. Several hidden layers can exist in one neural reprogrammed.
network. The final layer is the output layer, where there is 4. It can be implemented in any application and
one node for each class. A single sweep forward through without any problem.
the network results in the assignment of a value to each 5. High accuracy and noise tolerance.
output node, and the record is assigned to the class node
with the highest value. Disadvantages:
1. The neural network needs training to operate.
2. Requires high processing time for large neural
networks.
3. Lack of transparency
4. Learning time is long
5. Defining classification rule is difficult

2.3.2 Logistic regression


Multinomial logistic regression is a classification method
that generalizes logistic regression to multiclass
A key feature of neural networks is an iterative learning problems, i.e. with more than two possible discrete
process in which records (rows) are presented to the outcomes. That is, it is a model that is used to predict the
network one at a time, and the weights associated with the probabilities of the different possible outcomes of a
input values are adjusted each time. After all cases are categorically distributed dependent variable, given a set
presented, the process is often repeated. During this of independent variables.
learning phase, the network trains by adjusting the
weights to predict the correct class label of input samples. Multinomial logistic regression is a particular solution to
Advantages of neural networks include their high the classification problem that assumes that a linear
tolerance to noisy data, as well as their ability to classify combination of the observed features and some problem-
patterns on which they have not been trained. The most specific parameters can be used to determine the
popular neural network algorithm is the back-propagation probability of each particular outcome of the dependent
algorithm proposed in the 1980s. variable. The best values of the parameters for a given
problem are usually determined from some training data.
Once a network has been structured for a particular The difference between the multinomial logit model and
application, that network is ready to be trained. To start numerous other methods, models, algorithms, etc. with
this process, the initial weights (described in the next the same basic setup (the perceptron algorithm, support
section) are chosen randomly. Then the training (learning) vector machines, linear discriminant analysis, etc.) is the
begins. procedure for determining (training) the optimal
weights/coefficients and the way that the score is
The network processes the records in the Training Set one interpreted. In particular, in the multinomial logit model,
at a time, using the weights and functions in the hidden the score can directly be converted to a probability
layers, then compares the resulting outputs against the value, indicating the probability of observation i
desired outputs. Errors are then propagated back through choosing outcome k given the measured characteristics of
the system, causing the system to adjust the weights for the observation.
application to the next record. This process occurs
repeatedly as the weights are tweaked. During the training Advantages:
of a network, the same set of data is processed many times 1. Smooth function
as the connection weights are continually refined. 2. Get probability estimates
3. It’s simple and fast
Advantages 4. computationally inexpensive
1. A neural network can perform tasks that a linear
program cannot.
2. When an element of the neural network fails, it Drawbacks:
can continue without any problem by their 1. Limited expressive power
parallel nature. 2. Fundamentally a binary classifier

21
International Journal of Pure and Applied Mathematics Special Issue

3. Hard to make incremental 1. They are computationally simply to understand


4. it is prone to underfitting and may have low and interpret,
accuracy. 2. It can handle both numerical and categorical
data
2.3.3 Multiclass Perceptron 3. Performs well on large data in a short time.
4. Easy to understand
The perceptron is an algorithm for supervised learning of 5. Easy to generate rules
binary classifiers . It is a type of linear classifier, i.e. a
classification algorithm that makes its predictions based
Disadvantages
on a linear predictor function combining a set of weights
with the feature vector. The algorithm allows for online 1. It can create over-complex trees that do not
generalize the data well.
learning, in that it processes elements in the training set
one at a time. Like most other techniques for training 2. For data including categorical variables with
linear classifiers, the perceptron generalizes naturally to different number of levels, information gain in
multiclass classification. decision trees is biased in favour of those
attributes with more levels. [6]
Advantages: 3. Calculations can get very complex, particularly if
many values are uncertain and/or if many
1. Extremely simple updates (no gradient to calculate) outcomes are linked
2. No need to have all the data in memory (some point 4. May suffer from overfitting
stay classified correctly after a while)
2.3.5 K-Nearest Neighbor (kNN)
Drawbacks:
k-NN is a type of instance-based learning, or lazy
1. If the data is not separable decrease a slowly. learning, where the function is only approximated locally
and all computation is deferred until classification. In
2.3.4 Decision Trees pattern recognition, the k-nearest neighbors algorithm (k-
NN) is a non-parametric method used for classification
Decision tree builds classification or regression models in and regression. In both cases, the input consists of the k
the form of a tree structure. It breaks down a dataset into closest training examples in the feature space.
smaller and smaller subsets while at the same time an
associated decision tree is incrementally developed. The The principle behind nearest neighbor methods is to find
final result is a tree with decision nodes and leaf nodes. A a predefined number of training samples closest in
decision node has two or more branches. Leaf node distance to the new point, and predict the label from these.
represents a classification or decision. The topmost The number of samples can be a user-defined constant or
decision node in a tree which corresponds to the best vary based on the local density of points (radius-based
predictor called root node. Decision trees can handle both neighbor learning). The distance can, in general, be any
categorical and numerical data. metric measure: standard Euclidean distance is the most
common choice.
Problem: Overfitting split and classify very well the
training set, but we may do worse in terms of the Advantage
generalization error Solutions to the overfitting problem: 1. Simple technique that is easily implemented
2. Building model is cheap
Solution 1. – Prune branches of the tree built in the first 3. Extremely flexible classification scheme
phase – Use validation set to test for the overfit. 4. Robust in terms of not requiring the categories
to be linearly separable.
Solution 2. – Test for the overfit in the tree building phase
– Stop building the tree when performance on the Disadvantage
validation set deteriorates 1. High computational cost
2. Classification time is long’
3. Difficult to find optimal value of k
4. Classifying unknown records is relatively
Advantages
expensive

22
International Journal of Pure and Applied Mathematics Special Issue

5. Accuracy is severely degraded by noisy or dimensional space and uses almost all attributes. It
irrelevant function separates the space in a single pass to generate flat and
linear partitions. Divide the 2 categories by a clear gap
2.3.6 Naïve Bayes Classifier that should be as wide as possible. Do this partitioning by
a plane called hyperplane.
The Naive Bayes Classifier technique is based on the so-
called Bayesian theorem and is particularly suited when An SVM creates hyperplanes that have the largest margin
the dimensionality of the inputs is high. Despite its in a high-dimensional space to separate given data into
simplicity, Naive Bayes can often outperform more classes. The margin between the 2 classes represents the
sophisticated classification methods. The Bayesian longest distance between closest data points of those
classification is used as a probabilistic learning method classes. The larger the margin, the lower is the
(Naive Bayes text classification). Naive Bayes classifiers generalization error of the classifier. After training map
are among the most successful known algorithms for the new data to same space to predict which category they
learning to classify text documents. belong to. Categorize the new data into different
partitions and achieve it by training data.
It predicts membership probabilities for each class such
as the probability that given record or data point belongs Advantage
to a particular class. The class with the highest probability 1. Of all the available classifiers, SVM
is considered as the most likely class. This is also known provides the largest flexibility.
as Maximum A Posteriori (MAP).
2. High accuracy, nice theoretical guarantees
regarding overfitting, and with an
Naive Bayes classifier assumes that all the features are
appropriate kernel they can work well even
unrelated to each other. Presence or absence of a feature
does not influence the presence or absence of any other if you're data isn't linearly separable in the
feature. base feature space.
3. SVMs are like probabilistic approaches but
Advantage do not consider dependencies among
1. Easy to implement attributes.
2. Good results obtained in most of the cases Disadvantage
3. It only requires a small amount of training data
to estimate the parameters 1. Picking/finding the right kernel can be a challenge
4. Bayesian classifiers have also exhibited high 2. Results/output are incomprehensible
accuracy and speed when applied to large 3. No standardized way for dealing with multi-class
database. problems; fundamentally a binary classifier

Disadvantage 4. Comparison
1. Assumption, class conditional independence, 4.1 Different algorithm that can extended from
therefore loss of accuracy binary
2. Practically, dependencies among variables
cannot be modelled by naïve Bayesian classifier
3. Conditional independence assumption is
violated by real world data perform very poorly
when features are highly correlated.

2.3.7 Support Vector machine


SVM stands for Support Vector Machine. It is a machine
learning approach used for classification and regression
analysis. It depends on supervised learning models and
trained by learning algorithms. They analyze the large
amount of data to identify patterns from them.
An SVM generates parallel partitions by generating two
parallel lines. For each category of data in a high-

23
International Journal of Pure and Applied Mathematics Special Issue

4.2 Different algorithm that can Transformation to


binary
[3] Ravindra Changala, Annapurna Gummadi, G
Yedukondalu, UNPG Raju, Classification by
Decision Tree Induction Algorithm to Learn
Decision Trees from the class-Labeled Training
Tuples, International Journal of Advanced
Research in Computer Science and Software
Engineering, Volume 2, Issue 4, April 2012.

5. Conclusion [4] Yun-lei Cai, Duo Ji ,Dong-feng Cai,A KNN


Research Paper Classification Method Based on
Shared Nearest Neighbor, Natural Language Irina
This paper focuses on different supervised classification
Rish. An empirical study of the naive bayes
technique for multiclass classification. It explains how
classifier. In IJCAI Workshop on Empirical
binary classification methods can be extended to solve
Methods in Artificial Intelligence, 2001.
multiclass problem and explains how multiclass problem
can be reduced to multiple binary class problem. In this
[5] Simon Lacoste-Julien “Multi-Class and Structured
Decision Trees are fast to train and easy to evaluate and
Classification” Machine Learning Workshop
interrupt. It also explains how classes can be arranged in
8/24/07
a tree, usually a binary tree, and how to utilize a number
of binary classifiers at the nodes of the tree till a leaf node [6] Corinna Cortes and Vladimir Vapnik. Support-
is reached. Support vector machine gives good accuracy, vector networks. Machine Learning, pages 273–
power of flexibility from kernels. Neural networks are 297, 1995.
slow to converge and hard to set parameters but if done
with care it works wells. Bayesian classifiers are easy to [7] R. Tibshirani, et al., “Diagnosis of multiple caner
understand. Decision trees and rule based algorithms are types by shrunken centroids of gene expression”,
good because you can understand the model that was
Proc. Natl Acad. Sci.USA, 2002, 99:6567-6572.
built for classifying, unlike with neural networks.
Support Vector Machines work very well in many [8] Tomer Hertz Tomboy, Aharon Bar Hillel and
circumstances and perform very good with large Aharonbh Daphna Weinshall, “Learning a Kernel
amounts of data. Naive Bayes mechanism is very simple Function for Classification with Small Training
to understand, it has also a high performance and is also Samples,” School of computer Science and
easy to implement. The one-versus-all method yields the Engineering, The Center for Neural Computation,
best computational efficiency, while the one-versus-one The Hebrew University of Jerusalem, Jerusalem,
methods are preferred in terms of predictive Israel.
performance, especially when the observed class
[9] J. Weston, C. Watkins, Multi-class support vector
memberships are heavily unbalanced.
machines, Technical Report CSD-TR-98-04,
Department of Computer Science, Royal
6. References
Holloway, University of London, Egham, UK,
1998.
[1] R. K. Agrawal, “Data Mining Techniques for
Malware Detection” School of Computer and [10] Erin Allwein, Robert Shapire, and Yoram Singer.
Systems Sciences Jawaharlal Nehru Reducing multiclass to binary: A unifying
UniversityOliver Sutton, Introduction to k approach for margin classifiers. Journal of Machine
Nearest Neighbour Classification and mLearning Research, pages 113–141, 2000.
Condensed Nearest Neighbour Data [11] R. Rifkin, A. Klautau, In defense of one-versus-all
Reduction. classification, Journal of Machine Learning
Research 5 (2004) 101–143.
[2] Daniel Svozil, Vladimir KvasniEka, JiE Pospichal, [12] Chih-Wei Hsu and Chih-Jen Lin, “A Comparison
Introduction to multi-layer feed-forward neural of Methods for Multiclass Support Vector
networks, Chemometrics and Intelligent Machines,” IEEE Transactions on neural networks
Laboratory Systems 39 (1997) 43-62. vol.13, no. 2, march 2002.

24
International Journal of Pure and Applied Mathematics Special Issue

[13] T. G. Dietterich and G. Bakiri. Solving multiclass


learning problems via error correcting output
codes. Journal of Artificial Intelligence Research,
39:1–38, 1995.
[14] Volkan Vural and Jennifer G. Dy. A hierarchical
method for multiclass support vector machines. In
Proceedings of the twenty-first international
conference on Machine learning, pages 105–112,
2004.
[15] J. Demsar, Statistical comparisons of classifiers
over multiple datasets, Journal of Machine
Learning Research 7 (2006) 1–30.
[16] S. Kumar, J. Ghosh, M.M. Crawford, Hierarchical
fusion of multiple classifiers for hyperspectral data
analysis, Pattern Analysis& Applications, 5:210-
220, 2002.
[17] V. Vural, J.G. Dy, A hierarchical method for multi-
class support vector machines. In Proceedings of
the Twenty-First International Conference on
Machine Learning 105-112, 2004.
[18] Y. Guermeur, VC theory of large margin multi-
category classifiers, Journal of Machine Learning
Research 8 (2007) 2551–2594.
[19] Milos Hauskrecht CS 2750 Machine Learning
Lecture 13 “Multiclass classification Decision trees
“ 5329 Sennott Square.

[20] Mahesh Pal, Multiclass Approaches for Support


Vector Machine Based Land Cove Classification.
Lecturer, Department of Civil engineering National
Institute of Technology Kurukshetra, 136119,
Haryana (India), 2008.

25
26

You might also like