Classification FoundationalMathofAI S24
Classification FoundationalMathofAI S24
Yashil Sukurdeep
June 27, 2024
1 Classification: Fundamentals
In this lecture, we will explore the world of classification in machine learning.
Classification is a type of supervised learning where the goal is to predict the
category of a given input based on previously seen examples. We will discuss
two main types of classification problems: binary classification and multi-class
classification. We will also introduce two popular classification algorithms: the
k-Nearest Neighbors (kNN) classifier and the Naive Bayes’ classifier.
1
1.1 Binary classification
Binary classification involves categorizing data into one of two classes.
Example 1.1 (Spam Detection). Consider the task of classifying text messages
as spam or ham (i.e., not spam). Here, our input data is a text message, which
of course is not exactly a vector in Rd . Nevertheless, any given text message can
be represented by a feature vector, such as an array containing the frequency of
certain keywords appearing in the text message. Based on these feature vectors,
we can build a model to predict the class of new text messages, where the set of
possible classes is C = {ham, spam}.
2 Classification Algorithms
We now turn our attention to a couple of widely-used classification algorithms,
or classifiers.
2. For a new data point ⃗x, calculate the distance between ⃗x and all the
samples in the training set. While many choices exist, common functions
used to calculate the distance between two vectors ⃗x, ⃗y ∈ Rd include:
• Euclidean Distance:
v
u d
uX
d(⃗x, ⃗y ) = t (⃗xi − ⃗yi )2
i=1
2
• Manhattan Distance:
d
X
d(⃗x, ⃗y ) = |⃗xi − ⃗yi |
i=1
d
!1/p
X
p
d(⃗x, ⃗y ) = |⃗xi − ⃗yi |
i=1
3. Sort the distances and determine the k-nearest neighbors based on the
smallest distances.
4. Assign the class label based on the majority class among the k-nearest
neighbors.
3
2.3 Naive Bayes’ Classifier
The Naive Bayes’ classifier is based on Bayes’ theorem and assumes that features
(in a feature vector) are conditionally independent given the class label. Despite
this strong assumption, it often performs well in practice.
1. Training phase:
• Calculate the prior probability P (y) for each class y ∈ C.
• Calculate the likelihood P (⃗xk |y) for each feature ⃗xk given each class
y ∈ C.
2. Classification phase:
4
3 Performance Metrics for Classification Algo-
rithms
To evaluate the performance of classification algorithms, several metrics are
commonly used. These metrics provide insights into how well the classifier is
performing and help in comparing different classifiers. To illustrate the defini-
tions of these metrics, let us focus on the binary classification setting where the
two classes are C = {Positive, Negative} for instance.
3.2 Accuracy
Accuracy is the ratio of correctly predicted instances to the total instances. It
is a simple metric that provides an overall effectiveness of the classifier.
TP + TN
Accuracy =
TP + TN + FP + FN
3.3 Precision
Precision is the ratio of correctly predicted positive observations to the total
predicted positives. It indicates how many of the predicted positive instances
are actually positive.
TP
Precision =
TP + FP
3.4 Recall
Recall (also known as Sensitivity or True Positive Rate) is the ratio of correctly
predicted positive observations to the all observations in the actual class. It
measures how well the classifier identifies positive instances.
TP
Recall =
TP + FN
5
These metrics help in understanding the performance of the classification algo-
rithm beyond simple accuracy, providing a more detailed view of the classifier’s
ability to correctly identify positive and negative instances.
4 Conclusion
In this lecture, we introduced the concepts of binary and multi-class classifica-
tion and discussed two popular classification algorithms: k-Nearest Neighbors
(kNN) and the Naive Bayes’ classifier. These methods form the basis of many
machine learning applications, from spam detection to image recognition. We
also discussed performance metrics to help us evaluate the performance of our
classifiers, which allow us to quantify how confident we are in the predictions
made by the classifiers.