Machine Learning
Machine Learning
Classification is the task of “classifying things” into sub-categories. But, by a machine! If that
doesn’t sound like much, imagine your computer being able to differentiate between you
and a stranger. Between a potato and a tomato. Between an A grade and an F. Now, it
sounds interesting now. Classification is part of supervised machine learning in which we
put labeled data for training. In Machine Learning and Statistics, Classification is the
problem of identifying to which of a set of categories (subpopulations), a new observation
belongs, on the basis of a training set of data containing observations and whose categories
membership is known.
Types of Classification
Binary Classification: In binary classification, the goal is to classify the input into one of two
classes or categories. Example – On the basis of the given health conditions of a person, we
have to determine whether the person has a certain disease or not.
Multiclass Classification: In multi-class classification, the goal is to classify the input into one
of several classes or categories. For Example – On the basis of data about different species
of flowers, we have to determine which specie our observation belongs to.
Types of classification algorithms
Linear Classifiers: Linear models create a linear decision boundary between classes. They
are simple and computationally efficient. Some of the linear classification models are as
follows:
Logistic Regression
Support Vector Machines having kernel = ‘linear’
Single-layer Perceptron
Stochastic Gradient Descent (SGD) Classifier
K-Nearest Neighbours
Kernel SVM
Naive Bayes
Decision Tree Classification
Ensemble learning classifiers:
Random Forests,
AdaBoost,
Bagging Classifier,
Voting Classifier,
ExtraTrees Classifier
Multi-layer Artificial Neural Networks
In machine learning, classification learners can also be classified as either “lazy” or “eager”
learners.
Lazy Learners: Lazy Learners are also known as instance-based learners, lazy learners
do not learn a model during the training phase. Instead, they simply store the
training data and use it to classify new instances at prediction time. It is very fast at
prediction time because it does not require computations during the predictions. it is
less effective in high-dimensional spaces or when the number of training instances is
large. Examples of lazy learners include k-nearest neighbors and case-based
reasoning.
Eager Learners: Eager Learners are also known as model-based learners, eager
learners learn a model from the training data during the training phase and use this
model to classify new instances at prediction time. It is more effective in high-
dimensional spaces having large training datasets. Examples of eager learners
include decision trees, random forests, and support vector machines.
Classification algorithms are widely used in many real-world applications across various
domains, including:
Some algorithms are designed for binary classification problems. Examples include:
Logistic Regression
Perceptron
Support Vector Machines
As such, they cannot be used for multi-class classification tasks, at least not directly.
Instead, heuristic methods can be used to split a multi-class classification problem into
multiple binary classification datasets and train a binary classification model each.
One-vs-Rest (OvR)
One-vs-One (OvO)
One-vs-rest (OvR for short, also referred to as One-vs-All or OvA) is a heuristic method for
using binary classification algorithms for multi-class classification.
It involves splitting the multi-class dataset into multiple binary classification problems. A
binary classifier is then trained on each binary classification problem and predictions are
made using the model that is the most confident.
For example, given a multi-class classification problem with examples for each class ‘red,’
‘blue,’ and ‘green‘. This could be divided into three binary classification datasets as follows:
A possible downside of this approach is that it requires one model to be created for each
class. For example, three classes requires three models. This could be an issue for large
datasets (e.g. millions of rows), slow models (e.g. neural networks), or very large numbers of
classes (e.g. hundreds of classes).
One-vs-One (OvO for short) is another heuristic method for using binary classification
algorithms for multi-class classification.
For example, consider a multi-class classification problem with four classes: ‘red,’ ‘blue,’ and
‘green,’ ‘yellow.’ This could be divided into six binary classification datasets as follows:
This is significantly more datasets, and in turn, models than the one-vs-rest strategy
described in the previous section.
The formula for calculating the number of binary datasets, and in turn, models, is as follows:
We can see that for four classes, this gives us the expected value of six binary classification
problems:
(4 * (4 – 1)) / 2
(4 * 3) / 2
12 / 2
Each binary classification model may predict one class label and the model with the most
predictions or votes is predicted by the one-vs-one strategy.