0% found this document useful (0 votes)

9 views52 pages

W4 - Logistic Regression

The document outlines a lecture on Machine Learning focusing on logistic regression and its application in designing a spam filter. It discusses the problem setting, loss functions, and various algorithms such as Naïve Bayes and Nearest Neighbour. Additionally, it covers feature extraction and the scoring mechanism used by linear classifiers to predict classes.

Uploaded by

rimahmood2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views52 pages

W4 - Logistic Regression

Uploaded by

rimahmood2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Machine Learning (CS-245)

Spring – 2025

Dr. Mehwish Fatima

Assistant Professor,
AI & DS | SEECS | NUST, Islamabad
Overview of this week’s lecture

Logistic Regression

- Problem Setting: Designing a spam filter

- Loss functions in classification

- Naïve Bayes Algorithm

- Nearest Neighbour Algorithm

WM/04.02 S. 2

02/17
Let’s make an email spam filter

- Suppose you want to develop a ML-based model that detects whether a certain
email is important or spam.

WM/04.02 S. 3
https://www.youtube.com/watch?v=4o5hSxvN_-s

03/17
Let’s make an email spam filter

- Suppose you want to develop a ML-based model that detects whether a certain
email is important or spam.
From: [email protected]
Date: February 13, 2023
Subject: CS-370 Announcement

Dear students,
Please note that the first quiz on
CS-370 will be conducted on …

From: [email protected]
Date: February 13, 2023
Subject: URGENT!!!

Hello Dear,
I am a Nigerian prince, and I have
a business proposal for you ...

WM/04.02 S. 4
https://www.youtube.com/watch?v=4o5hSxvN_-s

03/17
Let’s make an email spam filter

- Suppose you want to develop a ML-based model that detects whether a certain
email is important or spam.
From: [email protected]
Date: February 13, 2023
- Input: 𝒙 = 𝑒𝑚𝑎𝑖𝑙 𝑚𝑒𝑠𝑠𝑎𝑔𝑒
Subject: CS-370 Announcement
- Output: 𝑦 ∈ {𝑠𝑝𝑎𝑚, 𝑛𝑜𝑡 − 𝑠𝑝𝑎𝑚} or 𝑦 ∈ {+1, −1}
Dear students,
- Objective: Learn a predictor 𝑓 such that, Please note that the first quiz on
CS-370 will be conducted on …

𝒙 Model 𝑦 ∈ {1, 0}
From: [email protected]
Date: February 13, 2023
Subject: URGENT!!!

Hello Dear,
I am a Nigerian prince, and I have
a business proposal for you ...

WM/04.02 S. 5
https://www.youtube.com/watch?v=4o5hSxvN_-s

03/17
Let’s make an email spam filter

𝒙 Model 𝑦 ∈ {1, 0}
From: [email protected]
Date: February 13, 2023
- The training dataset is,
Subject: URGENT!!!
𝒟𝑡𝑟𝑎𝑖𝑛 = [("… CS−370 …", −1), Partial
Specifications of Hello Dear,
("… 10 million USD …", +1),
I am a Nigerian prince, and I have
("… PVC pipes at redued …", +1)] behaviour
a business proposal for you ...

WM/04.02 S. 6
https://www.youtube.com/watch?v=4o5hSxvN_-s

03/17
In machine learning, input features are hand-crafted

- Let the example task is to predict whether a string 𝑥 is a valid email address.

- What properties of 𝑥 might be relevant for predicting 𝑦?

WM/04.02 S. 7

04/17
In machine learning, input features are hand-crafted

- Let the example task is to predict whether a string 𝑥 is a valid email address.

- What properties of 𝑥 might be relevant for predicting 𝑦?

- A feature extractor receives an input 𝑥 and outputs a set of feature-name and

feature-value pairs.

Length>10 : True 1
fracOfAlphabets : 0.85 0.85
Feature
𝑎𝑏𝑐@𝑔𝑚𝑎𝑖𝑙. 𝑐𝑜𝑚 Contains_@ : True 1
Extractor
endsWith_.com : True 1
endsWith_.edu : False 0

WM/04.02 S. 8

04/17
In machine learning, input features are hand-crafted

- Let the example task is to predict whether a string 𝑥 is a valid email address.

- What properties of 𝑥 might be relevant for predicting 𝑦?

- A feature extractor receives an input 𝑥 and outputs a set of feature-name and

feature-value pairs.

Length>10 : True 1
fracOfAlphabets : 0.85 0.85
Feature
𝑎𝑏𝑐@𝑔𝑚𝑎𝑖𝑙. 𝑐𝑜𝑚 Contains_@ : True 1
Extractor
endsWith_.com : True 1
endsWith_.edu : False 0
- For an input 𝑥, its feature vector is,

𝝓 𝑥 = [𝜙1 𝑥 , 𝜙2 𝑥 , … , 𝜙𝑑 (𝑥)]

- Think of 𝝓 𝑥 ∈ ℝ𝑑 as a point in a high-dimensional space.

WM/04.02 S. 9

04/17
A linear classifier calculates scores to predict classes

- Score of a training example (𝑥, 𝑦) is weighted sum of features. It represents how

confident the model is about a prediction.
Feature vector 𝜙 𝑥 ∈ ℝ𝒅
𝑑

𝑠𝑐𝑜𝑟𝑒 = 𝒘. 𝜙 𝑥 = ෍ 𝑤𝑖 𝜙(𝑥)𝑖 Length>10 :1

𝑖=1
fracOfAlphabets : 0.85
Contains_@ :1
endsWith_.com :1
endsWith_.edu :0

Weight vector 𝒘 ∈ ℝ𝒅
Length>10 : -1.2
fracOfAlphabets : 0.6
Contains_@ :3
endsWith_.com : 2.2
endsWith_.edu : 2.8

WM/04.02 S. 10

05/17
A linear classifier calculates scores to predict classes

- Score of a training example (𝑥, 𝑦) is weighted sum of features. It represents how

confident the model is about a prediction.
Feature vector 𝜙 𝑥 ∈ ℝ𝒅
𝑑

𝑠𝑐𝑜𝑟𝑒 = 𝒘. 𝜙 𝑥 = ෍ 𝑤𝑖 𝜙(𝑥)𝑖 Length>10 :1

𝑖=1
fracOfAlphabets : 0.85
Contains_@ :1
- Margin of an example (𝑥, 𝑦) is score multiplied by the true label. It represents how endsWith_.com :1
correct a model is about a prediction. endsWith_.edu :0

𝑚𝑎𝑟𝑔𝑖𝑛 = 𝒘. 𝜙 𝑥 . 𝑦
Weight vector 𝒘 ∈ ℝ𝒅
Length>10 : -1.2
fracOfAlphabets : 0.6
Contains_@ :3
endsWith_.com : 2.2
endsWith_.edu : 2.8

WM/04.02 S. 11

05/17
A linear classifier calculates scores to predict classes

- Score of a training example (𝑥, 𝑦) is weighted sum of features. It represents how

confident the model is about a prediction.
Feature vector 𝜙 𝑥 ∈ ℝ𝒅
𝑑

𝑠𝑐𝑜𝑟𝑒 = 𝒘. 𝜙 𝑥 = ෍ 𝑤𝑖 𝜙(𝑥)𝑖 Length>10 :1

𝑚𝑎𝑟𝑔𝑖𝑛 = 𝒘. 𝜙 𝑥 . 𝑦
Weight vector 𝒘 ∈ ℝ𝒅
- A linear classifier maps the scores to the given classes using appropriate function. Length>10 : -1.2
fracOfAlphabets : 0.6
+1, 𝑖𝑓 𝒘. 𝜙 𝑥 > 0 Contains_@ :3
𝑓𝑤 𝑥 = 𝑠𝑖𝑔𝑛 𝒘. 𝜙 𝑥 = ൞−1, 𝑖𝑓 𝒘. 𝜙 𝑥 < 0 endsWith_.com : 2.2
?, 𝑖𝑓 𝒘. 𝜙 𝑥 = 0 endsWith_.edu : 2.8

WM/04.02 S. 12

05/17
Relationship between data and weights can be visualised on 2D plan

- Let we have 𝑦
𝑓𝑤 𝑥 = 𝑠𝑖𝑔𝑛 𝒘. 𝜙 𝑥
𝑤 = 2, −1 5

4.5
𝝓 𝑥 ∈ 2,0 , 0,2 , [2,4] 4

3.5

2.5

1.5

0.5

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 𝑥

WM/04.02 S. 13

06/17
Relationship between data and weights can be visualised on 2D plan

- Let we have 𝑦
𝑓𝑤 𝑥 = 𝑠𝑖𝑔𝑛 𝒘. 𝜙 𝑥
𝑤 = 2, −1 5

4.5
𝝓 𝑥 ∈ 2,0 , 0,2 , [2,4] 4

3.5

2.5

1.5

0.5

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 𝑥

WM/04.02 S. 14

06/17
Relationship between data and weights can be visualised on 2D plan

- Let we have 𝑦
𝑓𝑤 𝑥 = 𝑠𝑖𝑔𝑛 𝒘. 𝜙 𝑥
𝑤 = 2, −1 5

4.5
𝝓 𝑥 ∈ 2,0 , 0,2 , [2,4] 4

3.5

2.5

2
−
1.5

0.5
+
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 𝑥

WM/04.02 S. 15

06/17
Relationship between data and weights can be visualised on 2D plan

- Let we have 𝑦
𝑓𝑤 𝑥 = 𝑠𝑖𝑔𝑛 𝒘. 𝜙 𝑥
𝑤 = 2, −1 5

4.5
𝝓 𝑥 ∈ 2,0 , 0,2 , [2,4] 4

3.5

2.5

2
−
1.5

0.5
+
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 𝑥

WM/04.02 S. 16

06/17
Relationship between data and weights can be visualised on 2D plan

- Let we have 𝑦
𝑓𝑤 𝑥 = 𝑠𝑖𝑔𝑛 𝒘. 𝜙 𝑥
𝑤 = 2, −1 5

4.5
𝝓 𝑥 ∈ 2,0 , 0,2 , [2,4] 4

3.5
- In general, a binary classifier 𝑓𝑤 defines a hyperplane decision boundary with
normal vector 𝑤. 3

2.5

- If 𝒙 ∈ ℝ2 : The hyperplane is a line 2

−
1.5
3
- If 𝒙 ∈ ℝ : The hyperplane is a plane
1

0.5
+
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 𝑥

WM/04.02 S. 17

06/17
Which loss function is suitable for a binary classifier?

- The simplest loss function for binary classification is Zero-One Loss.

𝑦
5
𝐿𝑜𝑠𝑠0−1 𝒙, 𝑦, 𝒘 = 𝟙 𝑓𝑤 𝑥 ≠ 𝑦 4.5
= 𝟙 𝒘. 𝜙 𝑥 . 𝑦 ≤ 0

𝐿𝑜𝑠𝑠 𝒙, 𝑦, 𝒘
4
3.5
3
2.5
2
1.5
1
0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 𝑥

(𝒘.𝜙(𝑥))𝑦

WM/04.02 S. 18

07/17
Which loss function is suitable for a binary classifier?

- The simplest loss function for binary classification is Zero-One Loss.

𝑦
5
𝐿𝑜𝑠𝑠0−1 𝒙, 𝑦, 𝒘 = 𝟙 𝑓𝑤 𝑥 ≠ 𝑦 4.5
= 𝟙 𝒘. 𝜙 𝑥 . 𝑦 ≤ 0

𝐿𝑜𝑠𝑠 𝒙, 𝑦, 𝒘
4
3.5
3
- What are the pros and cons of zero-one loss? 2.5
2
∞, 𝑖𝑓 𝑚𝑎𝑟𝑔𝑖𝑛 = 0 1.5
𝛁𝐿𝑜𝑠𝑠0−1 𝒙, 𝑦, 𝒘 = ቊ 1
0, 𝑖𝑓 𝑚𝑎𝑟𝑔𝑖𝑛 ≠ 0 0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 𝑥

(𝒘.𝜙(𝑥))𝑦

WM/04.02 S. 19

07/17
Which loss function is suitable for a binary classifier?

- The simplest loss function for binary classification is Zero-One Loss.

𝑦
5
𝐿𝑜𝑠𝑠0−1 𝒙, 𝑦, 𝒘 = 𝟙 𝑓𝑤 𝑥 ≠ 𝑦 4.5
= 𝟙 𝒘. 𝜙 𝑥 . 𝑦 ≤ 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 𝑥

- Hinge Loss is a better alternative to zero-one loss. (𝒘.𝜙(𝑥))𝑦
𝑦
𝐿𝑜𝑠𝑠ℎ𝑖𝑛𝑔𝑒 𝒙, 𝑦, 𝒘 = max 1 − 𝒘. 𝜙 𝑥 . 𝑦 , 0 5
0-1 Loss
4.5
Hinge Loss

𝐿𝑜𝑠𝑠 𝒙, 𝑦, 𝒘
4
3.5
3
2.5
2
1.5
1
0.5
WM/04.02 S. 20
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 𝑥
(𝒘.𝜙(𝑥))𝑦
07/17
Which loss function is suitable for a binary classifier?

- The simplest loss function for binary classification is Zero-One Loss.

𝑦
5
𝐿𝑜𝑠𝑠0−1 𝒙, 𝑦, 𝒘 = 𝟙 𝑓𝑤 𝑥 ≠ 𝑦 4.5
= 𝟙 𝒘. 𝜙 𝑥 . 𝑦 ≤ 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 𝑥

- Hinge Loss is a better alternative to zero-one loss. (𝒘.𝜙(𝑥))𝑦
𝑦
𝐿𝑜𝑠𝑠ℎ𝑖𝑛𝑔𝑒 𝒙, 𝑦, 𝒘 = max 1 − 𝒘. 𝜙 𝑥 . 𝑦 , 0 5
0-1 Loss
4.5
Hinge Loss

𝐿𝑜𝑠𝑠 𝒙, 𝑦, 𝒘
4
- It has non-trivial gradient. 3.5
3
2.5
0, 𝑖𝑓 𝑚𝑎𝑟𝑔𝑖𝑛 > 1 2
𝛁𝐿𝑜𝑠𝑠ℎ𝑖𝑛𝑔𝑒 𝒙, 𝑦, 𝒘 = ቊ 1.5
−𝜙 𝑥 . 𝑦, 𝑖𝑓 𝑚𝑎𝑟𝑔𝑖𝑛 ≤ 1
1
0.5
WM/04.02 S. 21
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 𝑥
(𝒘.𝜙(𝑥))𝑦
07/17
Which loss function is suitable for a binary classifier (continued)?

- Logistic loss or sigmoid loss is the most common loss function for binary
classification. 𝑦
5
0-1 Loss
4.5
𝐿𝑜𝑠𝑠𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝒙, 𝑦, 𝒘 = log 1 + 𝑒 −𝑚𝑎𝑟𝑔𝑖𝑛 Hinge Loss

𝐿𝑜𝑠𝑠 𝒙, 𝑦, 𝒘
4
3.5 Logistic Loss
3
2.5
- It tries to increase the margin even when it already exceeds 1. 2
1.5
1
0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 𝑥

(𝒘.𝜙(𝑥))𝑦

WM/04.02 S. 22

08/17
Which loss function is suitable for a binary classifier (continued)?

𝐿𝑜𝑠𝑠 𝒙, 𝑦, 𝒘
4
3.5 Logistic Loss
3
2.5
- It tries to increase the margin even when it already exceeds 1. 2
1.5
1
- Why is it important to take the average loss to update weights? 0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 𝑥

2
Assume we have, 𝒙 𝑦 𝐿𝑜𝑠𝑠(𝒙, 𝑦, 𝒘) = 𝒘. 𝜙 𝑥 − 𝑦 (𝒘.𝜙(𝑥))𝑦
[1, 0] 2 𝑤1 − 2 2
[1, 0] 4 𝑤1 − 4 2
[0, 1] −1 𝑤2 + 1 2

WM/04.02 S. 23

08/17
Which loss function is suitable for a binary classifier (continued)?

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 𝑥

2
Assume we have, 𝒙 𝑦 𝐿𝑜𝑠𝑠(𝒙, 𝑦, 𝒘) = 𝒘. 𝜙 𝑥 − 𝑦 (𝒘.𝜙(𝑥))𝑦
[1, 0] 2 𝑤1 − 2 2
[1, 0] 4 𝑤1 − 4 2
[0, 1] −1 𝑤2 + 1 2

- Taking average of the three examples helps update weights that satisfy most
training examples.

WM/04.02 S. 24

08/17
Classification task can have multiple variations

- Multiclass classification:

- 𝑦 is a category. Categorising data into one of 𝐶 classes.

WM/04.02 S. 25

09/17
Classification task can have multiple variations

- Multiclass classification:

- 𝑦 is a category. Categorising data into one of 𝐶 classes.

- Ranking:

- 𝑦 is a permutation. Ranking web pages in order of relevance.

WM/04.02 S. 26

09/17
Classification task can have multiple variations

- Multiclass classification:

- 𝑦 is a category. Categorising data into one of 𝐶 classes.

- Ranking:

- 𝑦 is a permutation. Ranking web pages in order of relevance.

- Structured Prediction:

- 𝑦 is an object built from parts. Language Translation

WM/04.02 S. 27

09/17
A linear classifier can only fit data that are linearly separable

- Linear classifiers have a limited capacity to learn complex patterns

in the data.

WM/04.02 S. 28
https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

10/17
A linear classifier can only fit data that are linearly separable

- Linear classifiers have a limited capacity to learn complex patterns

in the data.

- Can you do multiclass classification using linear classifiers?

WM/04.02 S. 29
https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

10/17
A linear classifier can only fit data that are linearly separable

- Linear classifiers have a limited capacity to learn complex patterns

in the data.

- Can you do multiclass classification using linear classifiers?

- Yes, you can

WM/04.02 S. 30
https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

10/17
A linear classifier can only fit data that are linearly separable

- Linear classifiers have a limited capacity to learn complex patterns

in the data.

- Can you do multiclass classification using linear classifiers?

- Yes, you can

- Naïve Bayes, SVM, Linear Regression are examples of linear

classifiers.

WM/04.02 S. 31
https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

10/17
A linear classifier can only fit data that are linearly separable

- Linear classifiers have a limited capacity to learn complex patterns

in the data.

- Can you do multiclass classification using linear classifiers?

- Yes, you can

- Naïve Bayes, SVM, Linear Regression are examples of linear

classifiers.

- What if the data are not linearly separable?

WM/04.02 S. 32
https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

10/17
A linear classifier can only fit data that are linearly separable

- Linear classifiers have a limited capacity to learn complex patterns

in the data.

- Can you do multiclass classification using linear classifiers?

- Yes, you can

- Naïve Bayes, SVM, Linear Regression are examples of linear

classifiers.

- What if the data are not linearly separable?

- Watch this video

https://www.youtube.com/watch?v=3liCbRZPrZA

WM/04.02 S. 33
https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

10/17
Logistic Regression is binary classification using logistic function

- Use a logistic function or sigmoid function for predicting class.

1
𝑓𝑤 𝑥 = 𝜎 𝒘. 𝜙 𝑥 =
1 + 𝑒 −𝒘.𝜙 𝑥

- If 𝑦 = 1, we want 𝑓𝑤 𝑥 ≈ 1, 𝒘. 𝜙 𝑥 ≫ 0.

𝜎 𝒘. 𝜙 𝑥
- If 𝑦 = 0, we want 𝑓𝑤 𝑥 ≈ 0, 𝒘. 𝜙 𝑥 ≪ 0.

𝒘. 𝜙 𝑥

WM/04.02 S. 34

11/17
Logistic Regression is binary classification using logistic function

- Use a logistic function or sigmoid function for predicting class.

1
𝑓𝑤 𝑥 = 𝜎 𝒘. 𝜙 𝑥 =
1 + 𝑒 −𝒘.𝜙 𝑥

- If 𝑦 = 1, we want 𝑓𝑤 𝑥 ≈ 1, 𝒘. 𝜙 𝑥 ≫ 0.

𝜎 𝒘. 𝜙 𝑥
- If 𝑦 = 0, we want 𝑓𝑤 𝑥 ≈ 0, 𝒘. 𝜙 𝑥 ≪ 0.

- Normalises inputs between (0,1) giving an illusion of probability

(psudoprobability).

𝒘. 𝜙 𝑥

WM/04.02 S. 35

11/17
How to calculate the cost of an example using Negative Log Likelihood?

- Let the cost function is,

𝐽 𝑤 = −(𝑦 log 𝑓𝑤 𝑥 + 1 − 𝑦 log(1 − 𝑓𝑤 𝑥 ))

WM/04.02 S. 36

12/17
How to calculate the cost of an example using Negative Log Likelihood?

- Let the cost function is,

𝐽 𝑤 = −(𝑦 log 𝑓𝑤 𝑥 + 1 − 𝑦 log(1 − 𝑓𝑤 𝑥 ))

𝐿 𝜃 = −(𝑦 log 𝑦ො + 1 − 𝑦 log(1 − 𝑦))

ො

If 𝑦 = 1, we want 𝑓𝑤 𝑥 ≈ 1, 𝒘. 𝜙 𝑥 ≫ 0

WM/04.02 S. 37

12/17
How to calculate the cost of an example using Negative Log Likelihood?

- Let the cost function is,

𝐽 𝑤 = −(𝑦 log 𝑓𝑤 𝑥 + 1 − 𝑦 log(1 − 𝑓𝑤 𝑥 ))

𝐿 𝜃 = −(𝑦 log 𝑦ො + 1 − 𝑦 log(1 − 𝑦))

ො

If 𝑦 = 1, we want 𝑓𝑤 𝑥 ≈ 1, 𝒘. 𝜙 𝑥 ≫ 0 If 𝑦 = 0, we want 𝑓𝑤 𝑥 ≈ 0, 𝒘. 𝜙 𝑥 ≪ 0

WM/04.02 S. 38

12/17
Decision boundary may or may not be linear

- How to find the decision boundary for this dataset?

- Set 𝒘 = [1, 1] and 𝑏 = −3

WM/04.02 S. 39

13/17
Decision boundary may or may not be linear

- How to find the decision boundary for this dataset?

- Set 𝒘 = [1, 1] and 𝑏 = −3

- How to find the decision boundary for this dataset?

- Let,
𝑓𝑤 𝑥 = 𝒘. 𝜙 𝑥 = 𝑔 𝑤1 𝜙1 𝑥 + 𝑤2 𝜙2 𝑥 + 𝑤3 𝜙12 𝑥 + 𝑤4 𝜙22 𝑥

- Set,
𝒘 = [0, 0, 1, 1] and 𝑏 = 1.

WM/04.02 S. 40

13/17
Naïve Bayes Classifier is a simple probabilistic classifier

- Let’s classify documents into different topics.

Dear
8

Friend
𝑝 𝐷𝑒𝑎𝑟 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.47
17

Lunch
5
𝑝 𝐹𝑟𝑖𝑒𝑛𝑑 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.29

Money
17
3
𝑝 𝐿𝑢𝑛𝑐ℎ 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.18
17
1 Normal
𝑝 𝑀𝑜𝑛𝑒𝑦 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.06
17

Money
Lunch
Dear
Friend
Spam
WM/04.02 S. 41

14/17
Naïve Bayes Classifier is a simple probabilistic classifier

- Let’s classify documents into different topics.

Dear
8

Friend
𝑝 𝐷𝑒𝑎𝑟 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.47
17

Lunch
5
𝑝 𝐹𝑟𝑖𝑒𝑛𝑑 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.29

Money
17
3
𝑝 𝐿𝑢𝑛𝑐ℎ 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.18
17
1 Normal
𝑝 𝑀𝑜𝑛𝑒𝑦 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.06
17
2
𝑝 𝐷𝑒𝑎𝑟 𝑆𝑝𝑎𝑚 = = 0.29
7
1
𝑝 𝐹𝑟𝑖𝑒𝑛𝑑 𝑆𝑝𝑎𝑚 = = 0.14
7
0

Money
𝑝 𝐿𝑢𝑛𝑐ℎ 𝑆𝑝𝑎𝑚 = = 0.0

Lunch
Dear
7

Friend
4
𝑝 𝑀𝑜𝑛𝑒𝑦 𝑆𝑝𝑎𝑚 = = 0.57
7
Spam
WM/04.02 S. 42

14/17
Naïve Bayes Classifier is a simple probabilistic classifier

- Let’s classify documents into different topics.

Dear
8

Friend
𝑝 𝐷𝑒𝑎𝑟 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.47
17

Lunch
5
𝑝 𝐹𝑟𝑖𝑒𝑛𝑑 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.29

Money
𝑝 𝐿𝑢𝑛𝑐ℎ 𝑆𝑝𝑎𝑚 = = 0.0

Lunch
Dear
7

Friend
4
𝑝 𝑀𝑜𝑛𝑒𝑦 𝑆𝑝𝑎𝑚 = = 0.57
7

- The probabilities calculated above are called likelihood. Spam

WM/04.02 S. 43

14/17
Naïve Bayes does not pay attention to the sequence of inputs

- Given the sentence ‘Dear Friend’, does it belong to politics or sports?

Dear
Friend
Lunch
Money
Normal

Money
Lunch
Dear
Friend
Spam
WM/04.02 S. 44

15/17
Naïve Bayes does not pay attention to the sequence of inputs

- Given the sentence ‘Dear Friend’, does it belong to politics or sports?

Dear
- Start with a Prior probability.

Friend
8
𝑃 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.67

Lunch
12

Money
4
𝑃 𝑆𝑝𝑎𝑚 = = 0.33
12
Normal

Money
Lunch
Dear
Friend
Spam
WM/04.02 S. 45

15/17
Naïve Bayes does not pay attention to the sequence of inputs

- Given the sentence ‘Dear Friend’, does it belong to politics or sports?

Dear
- Start with a Prior probability.

Friend
8
𝑃 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.67

Lunch
12

Money
4
𝑃 𝑆𝑝𝑎𝑚 = = 0.33
12
Normal
- Multiply prior with the likelihood of the message.

𝑝 𝑁𝑜𝑟𝑚𝑎𝑙 𝑑𝑒𝑎𝑟 𝑓𝑟𝑖𝑒𝑛𝑑 = 𝑝 𝑁𝑜𝑟𝑚𝑎𝑙 × 𝑝 𝑑𝑒𝑎𝑟 𝑁𝑜𝑟𝑚𝑎𝑙 × 𝑝 𝑓𝑟𝑖𝑒𝑛𝑑 𝑁𝑜𝑟𝑚𝑎𝑙

𝑝 𝑆𝑝𝑎𝑚 𝑑𝑒𝑎𝑟 𝑓𝑟𝑖𝑒𝑛𝑑 = 𝑝(𝑆𝑝𝑎𝑚) × 𝑝(𝑑𝑒𝑎𝑟|𝑆𝑝𝑎𝑚) × 𝑝(𝑓𝑟𝑖𝑒𝑛𝑑|𝑆𝑝𝑎𝑚)

Money
Lunch
Dear
Friend
Spam
WM/04.02 S. 46

15/17
Naïve Bayes does not pay attention to the sequence of inputs

- Given the sentence ‘Dear Friend’, does it belong to politics or sports?

Dear
- Start with a Prior probability.

Friend
8
𝑃 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.67

Lunch
12

Money
4
𝑃 𝑆𝑝𝑎𝑚 = = 0.33
12
Normal
- Multiply prior with the likelihood of the message.

𝑝 𝑁𝑜𝑟𝑚𝑎𝑙 𝑑𝑒𝑎𝑟 𝑓𝑟𝑖𝑒𝑛𝑑 = 𝑝 𝑁𝑜𝑟𝑚𝑎𝑙 × 𝑝 𝑑𝑒𝑎𝑟 𝑁𝑜𝑟𝑚𝑎𝑙 × 𝑝 𝑓𝑟𝑖𝑒𝑛𝑑 𝑁𝑜𝑟𝑚𝑎𝑙

𝑝 𝑆𝑝𝑎𝑚 𝑑𝑒𝑎𝑟 𝑓𝑟𝑖𝑒𝑛𝑑 = 𝑝(𝑆𝑝𝑎𝑚) × 𝑝(𝑑𝑒𝑎𝑟|𝑆𝑝𝑎𝑚) × 𝑝(𝑓𝑟𝑖𝑒𝑛𝑑|𝑆𝑝𝑎𝑚)

Money
Lunch
Dear
Friend
- The equality above is a proportionality.

Spam
WM/04.02 S. 47

15/17
Naïve Bayes does not pay attention to the sequence of inputs

- Given the sentence ‘Dear Friend’, does it belong to politics or sports?

Dear
- Start with a Prior probability.

Friend
8
𝑃 𝑁𝑜𝑟𝑚𝑎𝑙 = = 0.67

Lunch
12

Money
4
𝑃 𝑆𝑝𝑎𝑚 = = 0.33
12
Normal
- Multiply prior with the likelihood of the message.

𝑝 𝑁𝑜𝑟𝑚𝑎𝑙 𝑑𝑒𝑎𝑟 𝑓𝑟𝑖𝑒𝑛𝑑 = 𝑝 𝑁𝑜𝑟𝑚𝑎𝑙 × 𝑝 𝑑𝑒𝑎𝑟 𝑁𝑜𝑟𝑚𝑎𝑙 × 𝑝 𝑓𝑟𝑖𝑒𝑛𝑑 𝑁𝑜𝑟𝑚𝑎𝑙

𝑝 𝑆𝑝𝑎𝑚 𝑑𝑒𝑎𝑟 𝑓𝑟𝑖𝑒𝑛𝑑 = 𝑝(𝑆𝑝𝑎𝑚) × 𝑝(𝑑𝑒𝑎𝑟|𝑆𝑝𝑎𝑚) × 𝑝(𝑓𝑟𝑖𝑒𝑛𝑑|𝑆𝑝𝑎𝑚)

Money
Lunch
Dear
Friend
- The equality above is a proportionality.

- Redo the problem with the sentence ‘lunch money money money’. Spam
WM/04.02 S. 48
- Add a constant 𝛼 to each histogram.
15/17
What did we cover so far?

Classification Regression
Predictor Function 𝑠𝑖𝑔𝑛(𝑠𝑐𝑜𝑟𝑒) 𝑠𝑐𝑜𝑟𝑒
Relates to correct 𝑦 Margin (𝑠𝑐𝑜𝑟𝑒 × 𝑦) Residual (𝑠𝑐𝑜𝑟𝑒 − 𝑦)

0–1
Mean Absolute Error
Loss Functions Hinge
Mean Squared Error
Logistic

Optimiser SGD SGD

WM/04.02 S. 49

16/17
Summary of today’s lecture

- We discussed linear classification starting from binary classification to multi-class

classification.

- Learnt various loss functions. 𝒙 Model 𝑦 ∈ {1, 0}

- How to fit non-linear data with linear functions.

- Established the similarities and differences with linear regression.

- Discussed Naïve Bayes and Nearest Neighbour Algorithms.

WM/04.02 S. 50

17/17
Summary of today’s lecture

- We discussed linear classification starting from binary classification to multi-class

classification.

- Learnt various loss functions. 𝒙 Model 𝑦 ∈ {1, 0}

- How to fit non-linear data with linear functions.

- Established the similarities and differences with linear regression.

- Discussed Naïve Bayes and Nearest Neighbour Algorithms.

- Next Lecture:

- Decision Trees and Random Forest

WM/04.02 S. 51

17/17
Do you have any problem?

Some material (images, tables, text etc.) in this

presentation has been borrowed from different
books, lecture notes, and the web. The original
contents solely belong to their owners and are
used in this presentation only for clarifying
various educational concepts. Any copyright
infringement is not at all intended.

WM/04.02 S. 52

EOP

What's Next?: Binary Classification and Related Tasks Classification
No ratings yet
What's Next?: Binary Classification and Related Tasks Classification
44 pages
CRISP DM Methododlogy YLP
100% (1)
CRISP DM Methododlogy YLP
86 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
Grade 10.00 Out of 10.00 (100%) : Question Text
No ratings yet
Grade 10.00 Out of 10.00 (100%) : Question Text
71 pages
20ECE633T Machine Learning in VLSI
No ratings yet
20ECE633T Machine Learning in VLSI
81 pages
03 Classification Handout
No ratings yet
03 Classification Handout
24 pages
CS221 - Artificial Intelligence - Machine Learning - 3 Linear Classification
No ratings yet
CS221 - Artificial Intelligence - Machine Learning - 3 Linear Classification
28 pages
CS550 Lec7-ClassificationIntro
No ratings yet
CS550 Lec7-ClassificationIntro
49 pages
CS221 - Artificial Intelligence - Machine Learning - 2 Linear Regression
No ratings yet
CS221 - Artificial Intelligence - Machine Learning - 2 Linear Regression
24 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
51 pages
Learning 2
No ratings yet
Learning 2
104 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
The Hundred Page Machine Learning 2019
No ratings yet
The Hundred Page Machine Learning 2019
4 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Machine Learning - I
No ratings yet
Machine Learning - I
126 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Lecture 06 - Perceptron
No ratings yet
Lecture 06 - Perceptron
28 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
TensorFlow Classification
No ratings yet
TensorFlow Classification
68 pages
cs188 sp23 Lec25 - Z
No ratings yet
cs188 sp23 Lec25 - Z
38 pages
PW3 SupervisedLearning
No ratings yet
PW3 SupervisedLearning
10 pages
Lecture 02 Supervised Learning 27102022 124322am
No ratings yet
Lecture 02 Supervised Learning 27102022 124322am
29 pages
1 CourseOverview
No ratings yet
1 CourseOverview
34 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Lec 21
No ratings yet
Lec 21
34 pages
01.black Box ML
No ratings yet
01.black Box ML
67 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
Lecture 4.2 Supervised Learning Classification
No ratings yet
Lecture 4.2 Supervised Learning Classification
25 pages
Lecture W1c UG
No ratings yet
Lecture W1c UG
33 pages
Support Vector Machines
No ratings yet
Support Vector Machines
27 pages
Lecture 1, Part 2: Linear Classification: Roger Grosse
No ratings yet
Lecture 1, Part 2: Linear Classification: Roger Grosse
10 pages
AI Lec2.1 MLsupervised
No ratings yet
AI Lec2.1 MLsupervised
21 pages
MBZUAI Online Entry Exam Instructions: Click Here
100% (1)
MBZUAI Online Entry Exam Instructions: Click Here
5 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Lecture 1and2-Revision Part1
No ratings yet
Lecture 1and2-Revision Part1
53 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
No ratings yet
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
32 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
Logistic Regression - Byimran
No ratings yet
Logistic Regression - Byimran
35 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Week3 LearningI
No ratings yet
Week3 LearningI
48 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Lecture 19
No ratings yet
Lecture 19
8 pages
Algorithms For Artificial Intelligence
No ratings yet
Algorithms For Artificial Intelligence
69 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
SP14 CS188 Lecture 22 - Perceptron - Print
No ratings yet
SP14 CS188 Lecture 22 - Perceptron - Print
35 pages
lec21-ML II
No ratings yet
lec21-ML II
66 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
31 pages
INT354 - Unit 1
No ratings yet
INT354 - Unit 1
72 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
lec22-ML III
No ratings yet
lec22-ML III
51 pages
Bodea Chapter One
100% (1)
Bodea Chapter One
42 pages
Lecture 01-2
No ratings yet
Lecture 01-2
33 pages
Mini Tab 16 Help Data Sets
50% (2)
Mini Tab 16 Help Data Sets
21 pages
Wine Quality Research Paper
100% (1)
Wine Quality Research Paper
3 pages
LGD Modelling Compilation 1742809595
No ratings yet
LGD Modelling Compilation 1742809595
295 pages
The A To Z of Machine Learning Your Ulti
No ratings yet
The A To Z of Machine Learning Your Ulti
125 pages
Quasi Maximum Likelihood - Applications
No ratings yet
Quasi Maximum Likelihood - Applications
17 pages
Systat
No ratings yet
Systat
8 pages
Scope and Methods
No ratings yet
Scope and Methods
25 pages
Integrative Review of The Literature
No ratings yet
Integrative Review of The Literature
25 pages
Capstone Solution PDF
No ratings yet
Capstone Solution PDF
35 pages
Assignment 2 - PPP and Logistic
No ratings yet
Assignment 2 - PPP and Logistic
8 pages
Unilever Data Analysis Project: June 2003
No ratings yet
Unilever Data Analysis Project: June 2003
39 pages
E Space
No ratings yet
E Space
6 pages
Baño de Avion
No ratings yet
Baño de Avion
9 pages
Synopsis
No ratings yet
Synopsis
8 pages
Lab Report 02
No ratings yet
Lab Report 02
5 pages
Mod 5 03 Intro To Data Mar 02
No ratings yet
Mod 5 03 Intro To Data Mar 02
12 pages
GEEORD
No ratings yet
GEEORD
53 pages
Formulae For Calculating IDI and NRI
No ratings yet
Formulae For Calculating IDI and NRI
8 pages
Factors Associated With The Outcome of Root Canal Treatment-A Cohort Study Conducted in A Private Practice
No ratings yet
Factors Associated With The Outcome of Root Canal Treatment-A Cohort Study Conducted in A Private Practice
17 pages
Student Grade Prediction
No ratings yet
Student Grade Prediction
35 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
ERS Interracial Marriage 2013
No ratings yet
ERS Interracial Marriage 2013
24 pages
Credit Card Fraud Detection Using Predictive Modeling: A Review
No ratings yet
Credit Card Fraud Detection Using Predictive Modeling: A Review
7 pages
Wang2012 Factors Affecting Earthquake Recovery The Yao An Earthquake of China
No ratings yet
Wang2012 Factors Affecting Earthquake Recovery The Yao An Earthquake of China
17 pages
Jor S 2 Rating Player in Test Cricket
No ratings yet
Jor S 2 Rating Player in Test Cricket
13 pages
Choosing The Right Statistical Test
100% (1)
Choosing The Right Statistical Test
1 page
Autodesk Fusion PCB Black Book (V 2.0.21528)
From Everand
Autodesk Fusion PCB Black Book (V 2.0.21528)
Gaurav Verma
No ratings yet
Autodesk Fusion 360 PCB Black Book (V 2.0.18719)
From Everand
Autodesk Fusion 360 PCB Black Book (V 2.0.18719)
Gaurav Verma
No ratings yet