0% found this document useful (0 votes)
15 views

Machine Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Machine Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

What is Classification ?

Classification is the task of “classifying things” into sub-categories. But, by a machine! If that
doesn’t sound like much, imagine your computer being able to differentiate between you
and a stranger. Between a potato and a tomato. Between an A grade and an F. Now, it
sounds interesting now. Classification is part of supervised machine learning in which we
put labeled data for training. In Machine Learning and Statistics, Classification is the
problem of identifying to which of a set of categories (subpopulations), a new observation
belongs, on the basis of a training set of data containing observations and whose categories
membership is known.

Classification is a process of categorizing data or objects into predefined classes or


categories based on their features or attributes. In machine learning, classification is a type
of supervised learning technique where an algorithm is trained on a labeled dataset to
predict the class or category of new, unseen data. The main objective of classification is to
build a model that can accurately assign a label or category to a new observation based on
its features. For example, a classification model might be trained on a dataset of images
labeled as either dogs or cats and then used to predict the class of new, unseen images of
dogs or cats based on their features such as color, texture, and shape.

Types of Classification

Classification is of two types:

Binary Classification: In binary classification, the goal is to classify the input into one of two
classes or categories. Example – On the basis of the given health conditions of a person, we
have to determine whether the person has a certain disease or not.

Multiclass Classification: In multi-class classification, the goal is to classify the input into one
of several classes or categories. For Example – On the basis of data about different species
of flowers, we have to determine which specie our observation belongs to.
Types of classification algorithms

There are various types of classifiers. Some of them are :

Linear Classifiers: Linear models create a linear decision boundary between classes. They
are simple and computationally efficient. Some of the linear classification models are as
follows:

 Logistic Regression
 Support Vector Machines having kernel = ‘linear’
 Single-layer Perceptron
 Stochastic Gradient Descent (SGD) Classifier

Non-linear Classifiers: Non-linear models create a non-linear decision boundary between


classes. They can capture more complex relationships between the input features and the
target variable. Some of the non-linear classification models are as follows:

 K-Nearest Neighbours
 Kernel SVM
 Naive Bayes
 Decision Tree Classification
 Ensemble learning classifiers:
 Random Forests,
 AdaBoost,
 Bagging Classifier,
 Voting Classifier,
 ExtraTrees Classifier
 Multi-layer Artificial Neural Networks

Type of Learners in Classifications Algorithm

In machine learning, classification learners can also be classified as either “lazy” or “eager”
learners.

 Lazy Learners: Lazy Learners are also known as instance-based learners, lazy learners
do not learn a model during the training phase. Instead, they simply store the
training data and use it to classify new instances at prediction time. It is very fast at
prediction time because it does not require computations during the predictions. it is
less effective in high-dimensional spaces or when the number of training instances is
large. Examples of lazy learners include k-nearest neighbors and case-based
reasoning.
 Eager Learners: Eager Learners are also known as model-based learners, eager
learners learn a model from the training data during the training phase and use this
model to classify new instances at prediction time. It is more effective in high-
dimensional spaces having large training datasets. Examples of eager learners
include decision trees, random forests, and support vector machines.

Classification model Evaluations

Evaluating a classification model is an important step in machine learning, as it helps to


assess the performance and generalization ability of the model on new, unseen data. There
are several metrics and techniques that can be used to evaluate a classification model,
depending on the specific problem and requirements. Here are some commonly used
evaluation metrics:

 Classification Accuracy: The proportion of correctly classified instances over the


total number of instances in the test set. It is a simple and intuitive metric but can be
misleading in imbalanced datasets where the majority class dominates the accuracy
score.
 Confusion matrix: A table that shows the number of true positives, true negatives,
false positives, and false negatives for each class, which can be used to calculate
various evaluation metrics.
 Precision and Recall: Precision measures the proportion of true positives over the
total number of predicted positives, while recall measures the proportion of true
positives over the total number of actual positives. These metrics are useful in
scenarios where one class is more important than the other, or when there is a
trade-off between false positives and false negatives.
 F1-Score: The harmonic mean of precision and recall, calculated as 2 x (precision x
recall) / (precision + recall). It is a useful metric for imbalanced datasets where both
precision and recall are important.
 ROC curve and AUC: The Receiver Operating Characteristic (ROC) curve is a plot of
the true positive rate (recall) against the false positive rate (1-specificity) for
different threshold values of the classifier’s decision function. The Area Under the
Curve (AUC) measures the overall performance of the classifier, with values ranging
from 0.5 (random guessing) to 1 (perfect classification).
 Cross-validation: A technique that divides the data into multiple folds and trains the
model on each fold while testing on the others, to obtain a more robust estimate of
the model’s performance.

Applications of Classification Algorithm

Classification algorithms are widely used in many real-world applications across various
domains, including:

 Email spam filtering


 Credit risk assessment
 Medical diagnosis
 Image classification
 Sentiment analysis.
 Fraud detection
 Quality control
 Recommendation systems

Some algorithms are designed for binary classification problems. Examples include:

 Logistic Regression
 Perceptron
 Support Vector Machines

As such, they cannot be used for multi-class classification tasks, at least not directly.

Instead, heuristic methods can be used to split a multi-class classification problem into
multiple binary classification datasets and train a binary classification model each.

Two examples of these heuristic methods include:

 One-vs-Rest (OvR)
 One-vs-One (OvO)

One-Vs-Rest for Multi-Class Classification

One-vs-rest (OvR for short, also referred to as One-vs-All or OvA) is a heuristic method for
using binary classification algorithms for multi-class classification.

It involves splitting the multi-class dataset into multiple binary classification problems. A
binary classifier is then trained on each binary classification problem and predictions are
made using the model that is the most confident.

For example, given a multi-class classification problem with examples for each class ‘red,’
‘blue,’ and ‘green‘. This could be divided into three binary classification datasets as follows:

 Binary Classification Problem 1: red vs [blue, green]


 Binary Classification Problem 2: blue vs [red, green]
 Binary Classification Problem 3: green vs [red, blue]

A possible downside of this approach is that it requires one model to be created for each
class. For example, three classes requires three models. This could be an issue for large
datasets (e.g. millions of rows), slow models (e.g. neural networks), or very large numbers of
classes (e.g. hundreds of classes).

One-Vs-One for Multi-Class Classification

One-vs-One (OvO for short) is another heuristic method for using binary classification
algorithms for multi-class classification.

Like one-vs-rest, one-vs-one splits a multi-class classification dataset into binary


classification problems. Unlike one-vs-rest that splits it into one binary dataset for each
class, the one-vs-one approach splits the dataset into one dataset for each class versus
every other class.

For example, consider a multi-class classification problem with four classes: ‘red,’ ‘blue,’ and
‘green,’ ‘yellow.’ This could be divided into six binary classification datasets as follows:

 Binary Classification Problem 1: red vs. blue


 Binary Classification Problem 2: red vs. green
 Binary Classification Problem 3: red vs. yellow
 Binary Classification Problem 4: blue vs. green
 Binary Classification Problem 5: blue vs. yellow
 Binary Classification Problem 6: green vs. yellow

This is significantly more datasets, and in turn, models than the one-vs-rest strategy
described in the previous section.

The formula for calculating the number of binary datasets, and in turn, models, is as follows:

(NumClasses * (NumClasses – 1)) / 2

We can see that for four classes, this gives us the expected value of six binary classification
problems:

(NumClasses * (NumClasses – 1)) / 2

(4 * (4 – 1)) / 2

(4 * 3) / 2

12 / 2

Each binary classification model may predict one class label and the model with the most
predictions or votes is predicted by the one-vs-one strategy.

You might also like