0% found this document useful (0 votes)

12 views6 pages

Classification FoundationalMathofAI S24

This document provides an overview of classification in machine learning, focusing on binary and multi-class classification, and introduces two key algorithms: k-Nearest Neighbors (kNN) and Naive Bayes. It outlines the framework for classification tasks, including building training sets, training classifiers, testing, and evaluating performance using metrics such as accuracy, precision, and recall. The document also includes examples of classification tasks, such as spam detection and handwritten digit recognition.

Uploaded by

Tej Grover

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views6 pages

Classification FoundationalMathofAI S24

Uploaded by

Tej Grover

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Classification in Machine Learning

Yashil Sukurdeep
June 27, 2024

1 Classification: Fundamentals
In this lecture, we will explore the world of classification in machine learning.
Classification is a type of supervised learning where the goal is to predict the
category of a given input based on previously seen examples. We will discuss
two main types of classification problems: binary classification and multi-class
classification. We will also introduce two popular classification algorithms: the
k-Nearest Neighbors (kNN) classifier and the Naive Bayes’ classifier.

The following framework is typically used to tackle classification problems:

• Building a training set: The training set refers to a set of labelled examples
(Xtrain , Ytrain ) for the classification task, where:
– X train = {⃗x1 , . . . , ⃗xn } is input data that we wish to classify, e.g.
patients, images, text messages, and so on. We will assume that each
sample from the training set is a d-dimensional vector, i.e., ⃗xi ∈ Rd
for all i = 1, . . . , n. Often times, each training sample will in fact
be a feature vector, see Examples 1.1 and 1.2. We will express the
elements of a d-dimensional (feature) vector as ⃗x = (⃗x1 , . . . , ⃗xd ), and
refer to them as features.
– Y train = {y1 , . . . , yn } are labels, where the label yi corresponds to the
class of sample xi for each i = 1, . . . , n. The set of possible classes
for each label yi will be denoted by C.
• Training a classifier : The classifier is a model which will use the training
set to learn how to assign a class for any given input. There are various
types of classifiers that one can use.
• Testing the classifier: Once trained, we then apply the classifier on a set
of unseen examples (for which we know the labels, but the classifier does
not), which is called the test set (or the validation set).
• Evaluating the classifier’s performance: The goal here is to determine
how well the classifier has learnt to classify new inputs that it has not
seen before. We measure the performance of a classifier through a set of
performance metrics.

1
1.1 Binary classification
Binary classification involves categorizing data into one of two classes.
Example 1.1 (Spam Detection). Consider the task of classifying text messages
as spam or ham (i.e., not spam). Here, our input data is a text message, which
of course is not exactly a vector in Rd . Nevertheless, any given text message can
be represented by a feature vector, such as an array containing the frequency of
certain keywords appearing in the text message. Based on these feature vectors,
we can build a model to predict the class of new text messages, where the set of
possible classes is C = {ham, spam}.

1.2 Multi-Class Classification

Multi-class classification involves categorizing data into more than two classes.
Example 1.2 (Handwritten Digit Recognition). Each image of a handwritten
digit can be represented by its pixel values. We can use these pixel values as our
feature vector for each image. However, this might be a very high-dimensional
feature vector (think of how many pixels there are in the image captured by your
smartphone)! As a result, we can often use a simplified feature vector for each
image, and use these to build a model to classify it into one of 10 classes, i.e.,
C = {0, 1, . . . , 9}.

2 Classification Algorithms
We now turn our attention to a couple of widely-used classification algorithms,
or classifiers.

2.1 k-Nearest Neighbors (kNN)

The k-Nearest Neighbors algorithm is a simple, instance-based learning method
where the class of a new sample is determined by the majority class among its
k nearest neighbors in the training data, where k ∈ N is an integer. The kNN
algorithm works as follows:

1. Choose the number of neighbors k.

2. For a new data point ⃗x, calculate the distance between ⃗x and all the
samples in the training set. While many choices exist, common functions
used to calculate the distance between two vectors ⃗x, ⃗y ∈ Rd include:
• Euclidean Distance:
v
u d
uX
d(⃗x, ⃗y ) = t (⃗xi − ⃗yi )2
i=1

2
• Manhattan Distance:
d
X
d(⃗x, ⃗y ) = |⃗xi − ⃗yi |
i=1

• Minkowski Distance, where p ≥ 1:

d
!1/p
X
p
d(⃗x, ⃗y ) = |⃗xi − ⃗yi |
i=1

3. Sort the distances and determine the k-nearest neighbors based on the
smallest distances.
4. Assign the class label based on the majority class among the k-nearest
neighbors.

Figure 1: Illustration of k-Nearest Neighbors algorithm. The new data point

xtest is classified based on the majority class of its k = 1, 3, 4 nearest neighbors
(red circles and blue circles). The example with k = 4 shows why it is always a
good idea to choose an odd number k.

2.2 Example: Iris Flower Classification

The Iris dataset is one of the earliest datasets used in the literature on classifi-
cation methods and widely used in statistics and machine learning. It contains
three classes of iris flowers C = {IrisSetosa, IrisVersicolour, IrisVirginica}. Each
iris flower in the dataset is represented by a 4-dimensional feature vector, i.e.,
a vector in R4 . The four features are the sepal length, sepal width, petal length,
petal width. To classify a new flower, we find the k nearest neighbors in the
feature space and assign the class based on the majority vote.

3
2.3 Naive Bayes’ Classifier
The Naive Bayes’ classifier is based on Bayes’ theorem and assumes that features
(in a feature vector) are conditionally independent given the class label. Despite
this strong assumption, it often performs well in practice.

• Bayes’ Rule (or Bayes’ Theorem): Recall Bayes’ theorem, which

states:
P (⃗x|y)P (y)
P (y|⃗x) =
P (⃗x)
where P (y|⃗x) is the posterior probability of class y ∈ C given features
⃗x ∈ Rd , P (⃗x|y) is the likelihood of features ⃗x given class y, P (y) is the
prior probability of class y, and P (x) is the probability of features ⃗x.
• Naive Bayes’ Assumption: The Naive Bayes classifier assumes that
each feature ⃗xk in the feature vector ⃗x = (⃗x1 , . . . , ⃗xd ) is conditionally
independent of every other feature given the class label y. This simplifies
the computation of the likelihood P (⃗x|y):
d
Y
P (⃗x|y) = P (⃗x1 , ⃗x2 , . . . , ⃗xd |y) = P (⃗xk |y)
k=1

• Classification Rule: To classify a new instance, we compute the pos-

terior probability for each class and choose the class with the highest
posterior probability:
d
Y
ŷ = argmax P (y|⃗x) = argmax P (y) P (⃗xk |y)
y∈C y∈C
k=1

Steps to Classify Data Using Naive Bayes

1. Training phase:
• Calculate the prior probability P (y) for each class y ∈ C.
• Calculate the likelihood P (⃗xk |y) for each feature ⃗xk given each class
y ∈ C.
2. Classification phase:

• For a new data point ⃗x ∈ Rd , calculate the posterior probability

P (y|⃗x) for each class y ∈ C.
• Assign the data point to the class with the highest posterior proba-
bility.

4
3 Performance Metrics for Classification Algo-
rithms
To evaluate the performance of classification algorithms, several metrics are
commonly used. These metrics provide insights into how well the classifier is
performing and help in comparing different classifiers. To illustrate the defini-
tions of these metrics, let us focus on the binary classification setting where the
two classes are C = {Positive, Negative} for instance.

3.1 Confusion Matrix

The confusion matrix is a table used to describe the performance of a classi-
fication model on a set of test data for which the true values are known. It
provides a comprehensive breakdown of the classifier’s performance by showing
the counts of true positive (TP), true negative (TN), false positive (FP), and
false negative (FN) predictions.

Predicted Positive Predicted Negative

Actual Positive TP FN
Actual Negative FP TN

Table 1: Confusion Matrix for a Binary Classification Problem

3.2 Accuracy
Accuracy is the ratio of correctly predicted instances to the total instances. It
is a simple metric that provides an overall effectiveness of the classifier.
TP + TN
Accuracy =
TP + TN + FP + FN

3.3 Precision
Precision is the ratio of correctly predicted positive observations to the total
predicted positives. It indicates how many of the predicted positive instances
are actually positive.
TP
Precision =
TP + FP

3.4 Recall
Recall (also known as Sensitivity or True Positive Rate) is the ratio of correctly
predicted positive observations to the all observations in the actual class. It
measures how well the classifier identifies positive instances.
TP
Recall =
TP + FN

5
These metrics help in understanding the performance of the classification algo-
rithm beyond simple accuracy, providing a more detailed view of the classifier’s
ability to correctly identify positive and negative instances.

4 Conclusion
In this lecture, we introduced the concepts of binary and multi-class classifica-
tion and discussed two popular classification algorithms: k-Nearest Neighbors
(kNN) and the Naive Bayes’ classifier. These methods form the basis of many
machine learning applications, from spam detection to image recognition. We
also discussed performance metrics to help us evaluate the performance of our
classifiers, which allow us to quantify how confident we are in the predictions
made by the classifiers.

Regression Analysis Assignment
100% (1)
Regression Analysis Assignment
8 pages
Data Mining Classification and Prediction
No ratings yet
Data Mining Classification and Prediction
17 pages
MMPC 05 Ignou: Self Gyan
No ratings yet
MMPC 05 Ignou: Self Gyan
52 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
Total Listing Machine Learning
100% (1)
Total Listing Machine Learning
114 pages
Unit 3
No ratings yet
Unit 3
100 pages
Unit 4 ML
No ratings yet
Unit 4 ML
28 pages
03 - Classification PDF
No ratings yet
03 - Classification PDF
92 pages
Lecture 2 Final
No ratings yet
Lecture 2 Final
90 pages
ML 3RD Unit
No ratings yet
ML 3RD Unit
67 pages
Data Analytics Classification
No ratings yet
Data Analytics Classification
56 pages
4.0 Supervised Learning 4.1 Discuss Classification Model
No ratings yet
4.0 Supervised Learning 4.1 Discuss Classification Model
48 pages
Classification
No ratings yet
Classification
36 pages
7.classification Before
No ratings yet
7.classification Before
27 pages
Mod09-ppt2-ML in Image Classification
No ratings yet
Mod09-ppt2-ML in Image Classification
30 pages
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
No ratings yet
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
93 pages
Machine Learning Crash Course: Computer Vision James Hays
No ratings yet
Machine Learning Crash Course: Computer Vision James Hays
38 pages
ML 5
No ratings yet
ML 5
76 pages
Survey On Multiclass Classification Methods
No ratings yet
Survey On Multiclass Classification Methods
9 pages
Classification and Clustering Algorithms
No ratings yet
Classification and Clustering Algorithms
108 pages
Lecture7 KNN
No ratings yet
Lecture7 KNN
40 pages
Mla Unit-5'2
No ratings yet
Mla Unit-5'2
74 pages
Supervised Learning
No ratings yet
Supervised Learning
30 pages
Classification and Regression: Arturo Calder On Mora
No ratings yet
Classification and Regression: Arturo Calder On Mora
8 pages
8.predictive Analytics - Classification 2
No ratings yet
8.predictive Analytics - Classification 2
28 pages
ML.4-Classification Techniques (Week 5,6,7)
No ratings yet
ML.4-Classification Techniques (Week 5,6,7)
56 pages
CSCI946 W5-Classification
No ratings yet
CSCI946 W5-Classification
72 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
DM Assignment 2
No ratings yet
DM Assignment 2
23 pages
Learning AI
No ratings yet
Learning AI
34 pages
Unit 5
No ratings yet
Unit 5
28 pages
ML Chapter 3
No ratings yet
ML Chapter 3
45 pages
BSC ML CH1
No ratings yet
BSC ML CH1
63 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
ML - Mod2 Classification
No ratings yet
ML - Mod2 Classification
74 pages
FPA Unit 2
No ratings yet
FPA Unit 2
20 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
ML Unit 4
No ratings yet
ML Unit 4
76 pages
03 Classification
No ratings yet
03 Classification
66 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Module 4 - Classification
No ratings yet
Module 4 - Classification
10 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Bayesian
No ratings yet
Bayesian
23 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
BU EC507 Syllabus
No ratings yet
BU EC507 Syllabus
3 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
7 - Estimation of Parameters: CORE119 - Statistics and Probability
No ratings yet
7 - Estimation of Parameters: CORE119 - Statistics and Probability
8 pages
Audit Sampling Reviewer
No ratings yet
Audit Sampling Reviewer
4 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
L6 Lecture Image - Classification.fundemental v4
No ratings yet
L6 Lecture Image - Classification.fundemental v4
66 pages
Classification
No ratings yet
Classification
53 pages
Sample Size and Power Calculation
No ratings yet
Sample Size and Power Calculation
31 pages
Classification
No ratings yet
Classification
50 pages
Lecture 3 Basics of Clssification
No ratings yet
Lecture 3 Basics of Clssification
53 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Quantitative Analysis - Course Outline
No ratings yet
Quantitative Analysis - Course Outline
13 pages
Research Design Statistical Analysis 2nd Edition Jerome L. Myers Instant Download
No ratings yet
Research Design Statistical Analysis 2nd Edition Jerome L. Myers Instant Download
52 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
BUS 511 Spring2014-Sec02 Stat
No ratings yet
BUS 511 Spring2014-Sec02 Stat
4 pages
Bab 5 - Pengukuran, Kesahan Dan Kebolehpercayaan
No ratings yet
Bab 5 - Pengukuran, Kesahan Dan Kebolehpercayaan
23 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Statistical Machine Learning
No ratings yet
Statistical Machine Learning
28 pages
Statistics Problems: Measures of Central Tendency
No ratings yet
Statistics Problems: Measures of Central Tendency
13 pages
Review of Sessions 1-7 PUBH 614 Spring 2019
No ratings yet
Review of Sessions 1-7 PUBH 614 Spring 2019
68 pages
DS-301 Introduction To Data Science
No ratings yet
DS-301 Introduction To Data Science
2 pages
Multiple Comparison Tests-1
No ratings yet
Multiple Comparison Tests-1
45 pages
SME11e PPT ch10std
No ratings yet
SME11e PPT ch10std
79 pages
Data Analysis
No ratings yet
Data Analysis
10 pages
Design of Experiments in Growth Chambersuniformity
No ratings yet
Design of Experiments in Growth Chambersuniformity
8 pages
Assignment Sta116
No ratings yet
Assignment Sta116
3 pages
Home Assignment 2
No ratings yet
Home Assignment 2
1 page
Angelo Doldolea Chapter 8 PCK1
No ratings yet
Angelo Doldolea Chapter 8 PCK1
2 pages
BRM Unit 4 Full
No ratings yet
BRM Unit 4 Full
32 pages
Time Series Analysis Template
No ratings yet
Time Series Analysis Template
5 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Activity No. 10
No ratings yet
Activity No. 10
6 pages
Tutorial 2 Forecasting
No ratings yet
Tutorial 2 Forecasting
4 pages
Group 1 Ba165 Pilot Testing Result
No ratings yet
Group 1 Ba165 Pilot Testing Result
24 pages
Distribution
No ratings yet
Distribution
17 pages
Geo Prac Exercise 11
No ratings yet
Geo Prac Exercise 11
3 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Classification FoundationalMathofAI S24

Uploaded by

Classification FoundationalMathofAI S24

Uploaded by

Classification in Machine Learning

The following framework is typically used to tackle classification problems:

1.2 Multi-Class Classification

2.1 k-Nearest Neighbors (kNN)

1. Choose the number of neighbors k.

• Minkowski Distance, where p ≥ 1:

Figure 1: Illustration of k-Nearest Neighbors algorithm. The new data point

2.2 Example: Iris Flower Classification

• Bayes’ Rule (or Bayes’ Theorem): Recall Bayes’ theorem, which

• Classification Rule: To classify a new instance, we compute the pos-

Steps to Classify Data Using Naive Bayes

• For a new data point ⃗x ∈ Rd , calculate the posterior probability

3.1 Confusion Matrix

Predicted Positive Predicted Negative

Table 1: Confusion Matrix for a Binary Classification Problem

You might also like