M02Logistic Regression Logistic RegressioLogistic Regressionn

This document discusses logistic regression, including how it can be used for binary classification by outputting a probability. It covers the hypothesis representation using a sigmoid function, gradient descent training, and how regularization can be added to prevent overfitting. It also briefly discusses extensions to multi-class classification problems.

Uploaded by

anjibalaji52

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views19 pages

M02Logistic Regression Logistic RegressioLogistic Regressionn

Uploaded by

anjibalaji52

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Logistic Regression

With Gradient Descent and Regularization

Logistic Regression
◼ A classification method for binary
classification – returns the probability of
target variable y=1
◼ Can be implemented as an NN with sigmoid
activation function
◼ Can be seen as a Bayesian learning method
(maximum likelihood learner) for learning to
predict probability
◼ Can represent non-linear decision boundaries
Hypothesis Representation
◼ Parameters: vector 𝜃 = (𝜃0 , 𝜃1, … , 𝜃𝑛 )𝑇
◼ Hypothesis is a non-linear function of the input variables
( 𝑥1, … , 𝑥𝑛 )
Input data: D {< x1 , 𝑦 1 >, … , < 𝑥 𝑚 , 𝑦 𝑚 >} ), each yi ∈ 0, 1
𝑥 𝑖 = (𝑥0𝑖, 𝑥1𝑖 , … , 𝑥𝑛𝑖 )𝑇 , here 𝑥0𝑖 = 1 (We added the
0-th feature 𝑥0 =1 to simplify the notation).
Hypothesis ℎ𝜃 is obtained by applying the sigmoid function
to a linear function of the input variables:
𝑇 1
ℎ𝜃 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝜃 ∗ 𝑥 ) = 𝑇 (for logistic
1+𝑒 −𝜃 𝑥
regression).
In contrast, ℎ𝜃 𝑥 = 𝜃 𝑇 ∗ 𝑥 for linear regression
Sigmoid/logistic function
1
Sigmoid(x) =
1+𝑒 −𝑥
sigmoid(x) is monotone
and non-linear
when 𝑥 → +∞, Sigmoid(𝑥) → +1
when 𝑥 → −∞, Sigmoid(𝑥) → 0
when 𝑥 = 0, Sigmoid(𝑥) =0.5
Sigmoid(𝑥) > 0.5 ⇔ 𝑥 > 0
0 < Sigmoid(𝑥) < 1 (Sigmoid(x) is a bounded
function), with value in the (0, 1) open interval
Interpretation of the hypothesis –
returning a probability
ℎ𝜃 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝜃 𝑇 ∗ 𝑥 )
• Note that given a training data pair (x, y), ℎ𝜃 𝑥 can
be interpreted as returning the probability :
Pr 𝑦 = 1 𝑥; 𝜃), the probability that the target
variable y takes value 1, given observed 𝑥 and
the parameters 𝜃
• So if we use the ℎ𝜃 as a classifier, and predict
y = 1 if ℎ𝜃 𝑥 ≥ 0.5, this would be equivalent
to predicting y = 1 if 𝜃 𝑇 ∗ 𝑥 ≥ 0.
Interpretation of the hypothesis –
defining a decision boundary
ℎ𝜃 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝜃 𝑇 ∗ 𝑥 )
Predict y = 1 if ℎ𝜃 𝑥 ≥ 0.5, otherwise predict
y=0.
Equivalent to predicting y = 1 if and only if
𝜃 𝑇 ∗ 𝑥 ≥ 0.
➔ So the decision boundary (separating class 1
and class 0) is defined by the line 𝜃 𝑇 ∗ 𝑥 = 0.
This is a linear (in the input variables) decision
boundary (when we use only the original input
variables).
Illustration of a linear decision
boundary
• Two input variables
𝑥1 𝑎𝑛𝑑 𝑥2 . The 3
triangle points are
positive examples and
𝑥2
circle points are
negative examples.
• The decision
boundary (the red
3 𝑥1
line) is given by the
linear equation Namely, if 𝑥1 + 𝑥2 ≥ 3,
𝑥1 + 𝑥2 − 3 = 0 predict y=1, otherwise
predict y = 0.
Illustration of a non-linear decision
boundary
• Still two input variables 𝑥1 ,
𝑥2 . The decision boundary
(the red line) is NOT linear in .
𝑥1 𝑎𝑛𝑑 𝑥2 . Using quadratic 3
2
features (𝑥1 −3) and
(𝑥2 −3)2 with logistic
regression, we can get the
𝑥2
non-linear decision
boundary.
3 𝑥1
• The red circle is centered at
The decision:
(3, 3), with radius = 2 if (𝑥1 −3)2+ (𝑥2 −3)2 ≥ 4, predict y=1,
The points on the red line otherwise predict y = 0. Note the
satisfies the equation decision boundary is linear in the new
(𝑥1 −3)2 + (𝑥2−3)2 = 4 features (𝑥1 −3)2 and (𝑥2 −3)2
Logistic regression for non-linear
decision boundaries
◼ From the previous slide: use polynomial
features + logistic regression, we can handle
classification problems with non-linear
decision boundary
◼ The decision boundary is non-linear in the
original input variables
◼ But it is linear in the polynomial features
Loss function
◼ Parameters: vector 𝜃 = (𝜃0 , 𝜃1 , … , 𝜃𝑛 )𝑇
◼ Loss/Cost:
−1
J(𝜃) = σ𝑚 𝑖
𝑖=1 [𝑦 log ℎ𝜃 𝑥
𝑖
+ 1 − 𝑦𝑖 log 1 − ℎ𝜃 𝑥 𝑖 ]
𝑚
(mean cross-entropy)
(Remember: each 𝑦𝑖 ∈ 0, 1 ), so only one of the two terms
would be non-zero for each pair (𝑥 𝑖 , 𝑦𝑖 ) )
◼ Why use this loss function instead of the mean squared
error used in linear regression: The reason is that the
above loss function is convex for logistic regression,
whereas the loss formula used in linear regression, when
using ℎ𝜃 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝜃 𝑇 ∗ 𝑥 ) , is NO longer
convex ➔ this would make gradient descent difficult.
Convex function and gradient
descent
◼ For a function f(x) with one variable x, f is convex in
interval [a, b] if 𝑓 ′′ 𝑥 ≥ 0 on [a, b]
◼ For a multi-variate function f(𝑥1 , … , 𝑥𝑛 ) is convex if the
Hessian matrix H of f is positive semi-definite (i.e.,
del(H) ≥ 0 ) – intuitively the 2nd derivative of f is non-
negative.
◼ A convex function f has a unique minima (global
optimal point) – This would make gradient descent’s
job easy: As long as the learning rate 𝜂 is not too big,
GD (Gradient Descent) is guaranteed to find the
optimal solution.
Illustrations of convex and non-convex
functions

Example of a non- Example of a convex

convex function, with function, with only
local minimal points ONE global minimal
point
Intuition about the loss function
−1
J(𝜃) = σ𝑚 𝑖
𝑖=1 [𝑦 log ℎ𝜃 𝑥
𝑖
+ 1 − 𝑦𝑖 log 1 − ℎ𝜃 𝑥 𝑖 ]
𝑚

Consider the term −𝑦 𝑖 log ℎ𝜃 𝑥 𝑖

(when 𝑦 𝑖 =1): ℎ𝜃 𝑥 𝑖 takes value in
(0, 1) interval. Intuitively, when 𝑦 𝑖 =1,
if the predicted value ℎ𝜃 𝑥 𝑖 is closer
−log(𝑥)
to 1, then the loss should be small;
and the loss should be bigger if
ℎ𝜃 𝑥𝑖 is approaching 0. The curve
for the function −log(𝑥) in the interval
(0, 1] behaves exactly the desired 0 𝑥 1
way. When ℎ𝜃 𝑥 𝑖 ⟶ 0,
log(ℎ𝜃 𝑥 𝑖 )→ −∞, and thus
−log(ℎ𝜃 𝑥 𝑖 )→ +∞
Intuition about the loss function
−1
J(𝜃) = σ𝑚 𝑖
𝑖=1 [𝑦 log ℎ𝜃 𝑥
𝑖
+ 1 − 𝑦𝑖 log 1 − ℎ𝜃 𝑥 𝑖 ]
𝑚

Consider the term

−(1- 𝑦 𝑖 ) log 1 − ℎ𝜃 𝑥𝑖 (when
𝑦 𝑖 =0): ℎ𝜃 𝑥 𝑖 takes value in (0,
1) interval. Intuitively, when 𝑦 𝑖 =0, −log(1 − 𝑥)
if the predicted value ℎ𝜃 𝑥 𝑖 is
closer to 0, then the loss should
be small. The loss should be
bigger if ℎ𝜃 𝑥 𝑖 is approaching 1.
If we look at the curve for the 0 𝑥 1
function −log(1- x) in interval (0,
1], we see it behaves exactly the
desired way.
Gradient descent for training logistic
regression classifier
◼ Initialize 𝜃, and select learning rate 𝜂 > 0
◼ Then loop until convergence/termination
𝜕𝐽 𝜃
◼ Compute Δ𝜃 = −𝜂 𝜕𝜃
◼ 𝜃 ← 𝜃 + Δ𝜃
𝜕𝐽(𝜃) 1
◼ 𝜕𝜃
= 𝑚 σ𝑚 ℎ
𝑖=1 𝜃 𝑥 𝑖
− 𝑦 𝑖
∗ 𝑥 𝑖

𝜕𝐽 𝜃 𝜂
◼ So Δ𝜃 = −𝜂
𝜕𝜃
= 𝑚 σ𝑚
𝑖=1 𝑦 𝑖
− ℎ 𝜃 𝑥 𝑖
∗ 𝑥𝑖
◼ Note the update formula above looks the SAME as the formula
for gradient descent for linear regression – but actually the
ℎ𝜃 𝑥𝑖 =sigmoid(𝜃 T ∗ x i) here is different from the
ℎ𝜃 𝑥 𝑖 = 𝜃 T ∗ x i in linear regression!
Regularization
◼ When we have too many input variables, the model
may be too complex, risk of overfitting
◼ To handle it: add a regularization term to loss:
−1 𝜆
J(𝜃) = σ𝑚 𝑖
𝑖=1[𝑦 log ℎ𝜃 𝑥
𝑖 + 1 − 𝑦𝑖 log 1 − ℎ𝜃 𝑥 𝑖 ] + σ𝑛𝑗=1 𝜃𝑗2
𝑚 2𝑚
(𝜆 ≥ 0 )
Note that the regularization term starts from j=1.
The gradient would also be changed:

= +
Regularization
𝜆
◼ In the regularization term σ𝑛𝑗=1 𝜃𝑗2 ,
2𝑚
we do NOT regularize 𝜃0
◼ When 𝜆 = 0 𝑜𝑟 𝑣𝑒𝑟𝑦 𝑠𝑚𝑎𝑙𝑙, regularization
takes NO effect → may overfit
𝜆
◼ When 𝜆 is very big, to minimize σ𝑛𝑗=1 𝜃𝑗2 ,
2𝑚
the 𝜃𝑗 woud be forced to take very small
values, so the decision boundary would not
be good → may underfit
Generalization to more than two
class classification
Assume we have 3 classes 𝐶1 , 𝐶2 , and 𝐶3.
We build 3 binary classifiers 𝑀1 , 𝑀2 , 𝑀3 with logistic
regression – using the one-vs-all approach:
1. Generate training data 𝐷1 from the original data D: label
all examples of 𝐶1 as positive and all other examples as
negative.
2. Apply logistic regression to 𝐷1 and build 𝑀1 .
Repeat the above steps (1) and (2) to build 𝑀2 , 𝑀3
For a new data point x, we get 3 probabilities 𝑃1 , 𝑃2 , 𝑃3 by applying
𝑀1 , 𝑀2 , 𝑀3 to the data x. Predict class 𝐶𝑗 , if 𝑃𝑗 is the maximum
among {𝑃1 , 𝑃2 , 𝑃3 }.
Clearly the above method can be generalized to handle any k (>2)
class problem.
Practical considerations
◼ Feature scaling should be applied when the
input variables have rather different value
scales, just like in multivariate linear
regression
◼ Learning rate 𝜂 selection should also be done
carefully
◼ Selection of the regularization parameter 𝜆

Brochure Cae Sim Products
No ratings yet
Brochure Cae Sim Products
12 pages
Introduction To The Curriculum Ideologies (Schiro 2013)
100% (1)
Introduction To The Curriculum Ideologies (Schiro 2013)
16 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
Slide 2
No ratings yet
Slide 2
30 pages
Logistic Regression
No ratings yet
Logistic Regression
37 pages
06 Logistic Regression PDF
No ratings yet
06 Logistic Regression PDF
10 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Week 3 Lecture Notes
No ratings yet
Week 3 Lecture Notes
7 pages
Lecture 3. Classification
No ratings yet
Lecture 3. Classification
60 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Logistic - Regression Class 3
No ratings yet
Logistic - Regression Class 3
88 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
Logistic Regression by IntuitiveAI v2.5
No ratings yet
Logistic Regression by IntuitiveAI v2.5
8 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
Introduction To Machine Learning: 2 Linear Classifiers
No ratings yet
Introduction To Machine Learning: 2 Linear Classifiers
4 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
CH3 Logistic Regression 2020
No ratings yet
CH3 Logistic Regression 2020
28 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Lecture Note #9 - PEC-CS701E
No ratings yet
Lecture Note #9 - PEC-CS701E
41 pages
Lecture W3
No ratings yet
Lecture W3
28 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
ML - Logistic Regression&KNN
No ratings yet
ML - Logistic Regression&KNN
48 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
23 pages
Sample Research Paper
No ratings yet
Sample Research Paper
26 pages
ML-Unit I - Logistic Regression
No ratings yet
ML-Unit I - Logistic Regression
102 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Text Classification Using Logistics Regression
No ratings yet
Text Classification Using Logistics Regression
64 pages
Lecture3 Logistic Regression Classifier V0
No ratings yet
Lecture3 Logistic Regression Classifier V0
41 pages
Notes Chapter Logistic Regression
No ratings yet
Notes Chapter Logistic Regression
6 pages
03-Logistic Regression
No ratings yet
03-Logistic Regression
59 pages
Lecture 8 Logistic Regression
No ratings yet
Lecture 8 Logistic Regression
34 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Deep Learning Week 204-4
No ratings yet
Deep Learning Week 204-4
1 page
Lecture 03 Logistic Regression
No ratings yet
Lecture 03 Logistic Regression
34 pages
Lecture Notes Chapt13
No ratings yet
Lecture Notes Chapt13
15 pages
Lec 3
No ratings yet
Lec 3
22 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
My Philosophy in Life
No ratings yet
My Philosophy in Life
2 pages
Math 101 Test 2-2020 - 20210926 - 0001
No ratings yet
Math 101 Test 2-2020 - 20210926 - 0001
2 pages
Relativity: Tensor Analysis
No ratings yet
Relativity: Tensor Analysis
4 pages
PANAS
No ratings yet
PANAS
3 pages
70 Awesome Coaching Questions Using The GROW Model
100% (1)
70 Awesome Coaching Questions Using The GROW Model
3 pages
UPSA - Ethics of Project Management-3
0% (1)
UPSA - Ethics of Project Management-3
42 pages
Work From Home
100% (2)
Work From Home
8 pages
Nota Italiana
No ratings yet
Nota Italiana
9 pages
Dem Theory
No ratings yet
Dem Theory
8 pages
Upload Di Scribd
No ratings yet
Upload Di Scribd
2 pages
Online Genealogy Research Resources
80% (5)
Online Genealogy Research Resources
77 pages
Generation of Magic Square Using
No ratings yet
Generation of Magic Square Using
2 pages
Evoluia Creierului Prin Encefalizare Sau de Ce Schimbarea e Dureroas
No ratings yet
Evoluia Creierului Prin Encefalizare Sau de Ce Schimbarea e Dureroas
7 pages
World Retail Banking Report 2019 PDF
No ratings yet
World Retail Banking Report 2019 PDF
36 pages
Python - Make Your Own Mandelbrot Set PDF
No ratings yet
Python - Make Your Own Mandelbrot Set PDF
8 pages
Digital Libraries
No ratings yet
Digital Libraries
23 pages
The Critical Review Essay
No ratings yet
The Critical Review Essay
1 page
Purposive Communication
100% (2)
Purposive Communication
13 pages
ACT Resource Guide 2019
50% (2)
ACT Resource Guide 2019
26 pages
CG 160729 What Does Good Control Room Design Look Like
No ratings yet
CG 160729 What Does Good Control Room Design Look Like
10 pages
SF 9 - ES ( (Learner's Progress Report Card)
No ratings yet
SF 9 - ES ( (Learner's Progress Report Card)
2 pages
Combined Convection and Radiation
No ratings yet
Combined Convection and Radiation
9 pages
Stylus Remote Control Guide v1.1
No ratings yet
Stylus Remote Control Guide v1.1
11 pages
DR Ibrahim ASSEIDAT - Medicsindex Member Profile - 2009
No ratings yet
DR Ibrahim ASSEIDAT - Medicsindex Member Profile - 2009
2 pages
Cloze Test - Study Notes PDF
No ratings yet
Cloze Test - Study Notes PDF
11 pages
Dynamic System Modeling and Control - Hugh Jack
100% (2)
Dynamic System Modeling and Control - Hugh Jack
1,016 pages
Track 2 - Final Major Project (Coursera) - MCA 3
No ratings yet
Track 2 - Final Major Project (Coursera) - MCA 3
2 pages
An Extensive Examination of Data Structures 6
No ratings yet
An Extensive Examination of Data Structures 6
17 pages

M02Logistic Regression Logistic RegressioLogistic Regressionn

Uploaded by

M02Logistic Regression Logistic RegressioLogistic Regressionn

Uploaded by

Logistic Regression

With Gradient Descent and Regularization

Example of a non- Example of a convex

Consider the term −𝑦 𝑖 log ℎ𝜃 𝑥 𝑖

Consider the term

You might also like