0% found this document useful (0 votes)
6 views35 pages

Logistic Regression - Byimran

machine learning notes

Uploaded by

ahsan856jalal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views35 pages

Logistic Regression - Byimran

machine learning notes

Uploaded by

ahsan856jalal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Logistic

Regression
Classification
Machine Learning
Lecture Slides by Andrew Ng
Classification

Is the email spam? Yes/ No


Is online transactions fraudulent? Yes/ No
Is tumor malignant? Yes/ No

0: “Negative Class” (e.g., benign tumor)


1: “Positive Class” (e.g., malignant tumor)

Yes | No
True | False
Binary Classification ⇒
1 | 0
Positive | Negative
Andrew Ng
Let’s make an email spam filter

- Suppose you want to develop a ML-based model that detects whether a


certain email is important or spam. From: [email protected]
Date: February 13, 2023
- Input: 𝒙 = 𝑒𝑚𝑎𝑖𝑙 𝑚𝑒𝑠𝑠𝑎𝑔𝑒 Subject: CS-370 Announcement

- Output: 𝑦 ∈ {𝑠𝑝𝑎𝑚, 𝑛𝑜𝑡 − 𝑠𝑝𝑎𝑚} or 𝑦 ∈ {+1, −1} Dear students,


Please note that the first quiz on
- Objective: Learn a predictor 𝑓 such that, CS-370 will be conducted on …

From: [email protected]
Date: February 13, 2023
𝒙 Model 𝑦 ∈ {1, 0} Subject: URGENT!!!

Hello Dear,
- The training dataset is, I am a Nigerian prince, and I
have a business proposal for you
Partial ...
𝒟𝑡𝑟𝑎𝑖𝑛 = [("… CS−370 …", −1),
("… 10 million USD …", +1), Specifications
("… PVC pipes at redued …", +1)] of behaviour
In machine learning, input features are hand-crafted

- Let the example task is to predict whether a string 𝑥 is a valid email


address.

- What properties of 𝑥 might be relevant for predicting 𝑦?

- A feature extractor receives an input 𝑥 and outputs a set of feature-name


and feature-value pairs.

Length>10 : True 1
fracOfAlphabets : 0.85 0.85
𝑎𝑏𝑐@𝑔𝑚𝑎𝑖𝑙. 𝑐𝑜𝑚 Feature
Extractor Contains_@ : True 1
endsWith_.com : True 1
endsWith_.edu : False 0

- For an input 𝑥, its feature vector is,

𝝓 𝑥 = [𝜙1 𝑥 , 𝜙2 𝑥 , … , 𝜙𝑑 (𝑥)]
04/18
A linear classifier calculates scores to predict classes

- Score of a training example (𝑥, 𝑦) is weighted sum of features. It


represents how confident the model is about a prediction.
Feature vector 𝜙 𝑥 ∈ ℝ𝒅
𝑑
Length>10 : 1
𝑠𝑐𝑜𝑟𝑒 = 𝒘. 𝜙 𝑥 = ෍ 𝑤𝑖 𝜙(𝑥)𝑖 fracOfAlphabets : 0.85
𝑖=1 Contains_@ : 1
endsWith_.com :1
- A linear classifier maps the scores to the given classes using appropriate endsWith_.edu :0
function.
Weight vector 𝒘 ∈ ℝ𝒅
+1, 𝑖𝑓 𝒘. 𝜙 𝑥 > 0
𝑓𝑤 𝑥 = 𝑠𝑖𝑔𝑛 𝒘. 𝜙 𝑥 = ൞−1, 𝑖𝑓 𝒘. 𝜙 𝑥 < 0 Length>10 : -1.2
fracOfAlphabets : 0.6
?, 𝑖𝑓 𝒘. 𝜙 𝑥 = 0 Contains_@ : 3
endsWith_.com : 2.2
endsWith_.edu : 2.8

05/18
Relationship between data and weights can be visualised on 2D plan

- Let we have 𝑦
𝑓𝑤 𝑥 = 𝑠𝑖𝑔𝑛 𝒘. 𝜙 𝑥
5
𝑤 = 2, −1
4.5

𝝓 𝑥 ∈ 2,0 , 0,2 , [2,4] 4

3.5

2.5

1.5

0.5

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 𝑥

06/18
Relationship between data and weights can be visualised on 2D plan

- Let we have 𝑦
𝑓𝑤 𝑥 = 𝑠𝑖𝑔𝑛 𝒘. 𝜙 𝑥
5
𝑤 = 2, −1
4.5

𝝓 𝑥 ∈ 2,0 , 0,2 , [2,4] 4

3.5

2.5

1.5

0.5

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 𝑥

06/18
Relationship between data and weights can be visualised on 2D plan

- Let we have 𝑦
𝑓𝑤 𝑥 = 𝑠𝑖𝑔𝑛 𝒘. 𝜙 𝑥
5
𝑤 = 2, −1
4.5

𝝓 𝑥 ∈ 2,0 , 0,2 , [2,4] 4

3.5

2.5

2

1.5

0.5
+
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 𝑥

06/18
Relationship between data and weights can be visualised on 2D plan

- Let we have 𝑦
𝑓𝑤 𝑥 = 𝑠𝑖𝑔𝑛 𝒘. 𝜙 𝑥
5
𝑤 = 2, −1
4.5

𝝓 𝑥 ∈ 2,0 , 0,2 , [2,4] 4

3.5

2.5

2

1.5

0.5
+
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 𝑥

06/18
Relationship between data and weights can be visualised on 2D plan

- Let we have 𝑦
𝑓𝑤 𝑥 = 𝑠𝑖𝑔𝑛 𝒘. 𝜙 𝑥
5
𝑤 = 2, −1
4.5

𝝓 𝑥 ∈ 2,0 , 0,2 , [2,4] 4

3.5

- In general, a binary classifier 𝑓𝑤 defines a hyperplane decision boundary 3

with normal vector 𝑤. 2.5

2

- If 𝒙 ∈ ℝ2 : The hyperplane is a line 1.5

1
- If 𝒙 ∈ ℝ3 : The hyperplane is a plane 0.5
+
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 𝑥

06/18
(Yes)
1
Malignant ?
(No)
0 Tumor Size Tumor Size
Threshold will
change because of
this new point
Threshold classifier output at 0.5:
If , predict “y = 1”
If , predict “y = 0”
= fw,b (x)
Classification: y = 0 or 1

Can not be > 1 or < 0

Logistic Regression:

Logistic Regression is classification (not regression)

Andrew Ng
Logistic
Regression
Hypothesis
Representation
Machine Learning
Logistic Regression Model
Want

Linear regression:
1

Logistic regression: 0.5

Sigmoid function
Logistic function
Andrew Ng
Interpretation of Hypothesis Output
= estimated probability that y = 1 on input x

Example: If

Tell patient that 70% chance of tumor being malignant


“probability that y = 1, given x,
= parameterized by ”
“probability that y = 1, given x,
1- = parameterized by ”
Andrew Ng
Logistic
Regression
Decision boundary

Machine Learning
Sigmoid function

Logistic regression 1

z
z = 100, z = -100,
e-z = e-100 (v small) e-z = e100 (v big)
g(z) ~ 1 g(z) ~ 0

predict “ or z > 0 z = 0,
“ if
e-z = e-0 = 1
predict “ “ if or z < 0 g(z) ~ 1/2 = 0.5

Andrew Ng
Decision Boundary
x2
-3 1 1
y=1
3
2
y=0 1

1 2 3 x1 Predict “ “ if

⇒ decision boundary
-3 + x1 + x2 = 0
x1 + x2 = 3
Andrew Ng
Non-linear decision boundaries
x2 -1 1 1

-1 1 x1 Predict “ ”if
-1

x2

x1

Andrew Ng
Logistic
Regression
Cost function

Machine Learning
Training set:

m examples

How to choose parameters ?


Andrew Ng
Squared Error Cost function
Linear regression:

Linear function: Logistic function:

“convex” “non-convex”

Andrew Ng
Logistic regression cost function -log(1) = 0
-log(0) = ∞

If y = 1

cost

0 1
Andrew Ng
Logistic regression cost function -log(1) = 0
-log(0) = ∞

If y = 0

cost

0 1 Andrew Ng
Logistic
Regression
Simplified cost function
and gradient descent

Machine Learning
Logistic regression cost function

y=1 (1-y) = 1-1 = 0

y=0 (1-y) = 1-0 = 1


Andrew Ng
Logistic regression cost function

To fit parameters :

To make a prediction given new :


Output
Andrew Ng
Gradient Descent

Want :
Repeat

(simultaneously update all )

Andrew Ng
Gradient Descent

Want :
Repeat

(simultaneously update all )

Algorithm looks identical to linear regression!

Andrew Ng
Logistic
Regression
Multi-class classification:
One-vs-all

Machine Learning
Multiclass classification
Email foldering/tagging: Work, Friends, Family, Hobby

Medical diagrams: Not ill, Cold, Flu

Weather: Sunny, Cloudy, Rain, Snow

Andrew Ng
Binary classification: Multi-class classification:

x2 x2

x1 x1
Andrew Ng
x2
One-vs-all (one-vs-rest):

x1
x2 x2

x1 x1
x2
Class 1:
Class 2:
Class 3:
x1
Andrew Ng
One-vs-all

Train a logistic regression classifier for each


class to predict the probability that .

On a new input , to make a prediction, pick the


class that maximizes

Andrew Ng
Aspect Linear Regression Logistic Regression

Output Continuous (real number) Categorical (binary/multiclass)

Regression Classification
Purpose
(predicting continuous value) (predicting categories)

Model Equation Y = WTX + b P (Y = 1)

Activation Function None (linear output) Sigmoid function

Loss Function Mean Squared Error (MSE) Cross-Entropy (Log Loss)

You might also like