0% found this document useful (0 votes)

41 views42 pages

3 Logistic Regression and Regularization

This document discusses logistic regression. It covers the hypothesis representation using the sigmoid function, the cost function for logistic regression using maximum likelihood estimation, training logistic regression using gradient descent, regularization to address overfitting, and multi-class classification using a one-vs-all approach with multiple logistic regression classifiers.

Uploaded by

Smit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views42 pages

3 Logistic Regression and Regularization

Uploaded by

Smit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Logistic Regression

Logistic Regression
• Hypothesis representation

• Cost function

• Logistic regression with gradient descent

• Regularization

• Multi-class classification
Logistic Regression
• Hypothesis representation

• Cost function

• Logistic regression with gradient descent

• Regularization

• Multi-class classification
1 (Yes)
Malignant?

0 (No)
Tumor Size
ℎ𝜃 𝑥 = 𝜃 ⊤ 𝑥

• Threshold classifier output ℎ𝜃 𝑥 at 0.5

– If ℎ𝜃 𝑥 ≥ 0.5, predict “𝑦 = 1”
– If ℎ𝜃 𝑥 < 0.5, predict “𝑦 = 0”
Slide credit: Andrew Ng
Classification: 𝑦 = 1 or 𝑦 = 0

ℎ𝜃 𝑥 = 𝜃 ⊤ 𝑥 (from linear regression)

can be > 1 or < 0

Logistic regression: 0 ≤ ℎ𝜃 𝑥 ≤ 1

Logistic regression is actually for classification

Slide credit: Andrew Ng
Hypothesis representation
• Want 0 ≤ ℎ𝜃 𝑥 ≤ 1 1
ℎ𝜃 𝑥 = −𝜃 ⊤𝑥
1+𝑒
• ℎ𝜃 𝑥 = 𝑔 𝜃 ⊤ 𝑥 ,

1
where 𝑔 𝑧 =
1+𝑒 −𝑧 𝑔(𝑧)

• Sigmoid function
𝑧 Slide credit: Andrew Ng
Interpretation of hypothesis output
• ℎ𝜃 𝑥 = estimated probability that 𝑦 = 1 on input 𝑥

𝑥0 1
• Example: If 𝑥 = x =
1 tumorSize
• ℎ𝜃 𝑥 = 0.7

• Tell patient that 70% chance of tumor being malignant

Slide credit: Andrew Ng

Logistic regression
ℎ𝜃 𝑥 = 𝑔 𝜃 ⊤ 𝑥 𝑔(𝑧)
1
𝑔 𝑧 =
1 + 𝑒 −𝑧
𝑧 = 𝜃⊤𝑥
Suppose predict “y = 1” if ℎ𝜃 𝑥 ≥ 0.5
𝑧 = 𝜃⊤𝑥 ≥ 0
predict “y = 0” if ℎ𝜃 𝑥 < 0.5
𝑧 = 𝜃⊤𝑥 < 0
Slide credit: Andrew Ng
Decision boundary
• ℎ𝜃 𝑥 = 𝑔(𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 )

Age
E.g., 𝜃0 = −3, 𝜃1 = 1, 𝜃2 = 1

Tumor Size

• Predict “𝑦 = 1” if −3 + 𝑥1 + 𝑥2 ≥
0
Slide credit: Andrew Ng
• ℎ𝜃 𝑥 = 𝑔(𝜃0 + 𝜃1 𝑥1 + 𝜃2𝑥2
+ 𝜃3 𝑥12 + 𝜃4 𝑥22 )

E.g., 𝜃0 = −1, 𝜃1 = 0, 𝜃2 = 0, 𝜃3 = 1, 𝜃4 = 1

• Predict “𝑦 = 1” if −1 + 𝑥12 + 𝑥22 ≥ 0

• ℎ𝜃 𝑥 = 𝑔(𝜃0 + 𝜃1 𝑥1 + 𝜃2𝑥2 + 𝜃3 𝑥12 +

𝜃4 𝑥12 𝑥2 + 𝜃5 𝑥12𝑥22 + 𝜃6 𝑥13𝑥2 + ⋯ )
Slide credit: Andrew Ng
Logistic Regression
• Hypothesis representation

• Cost function

• Logistic regression with gradient descent

• Regularization

• Multi-class classification
Training set with 𝑚 examples
{ 𝑥 1 ,𝑦 1 , 𝑥 2 ,𝑦 2 ,⋯, 𝑥 𝑚
,𝑦 𝑚

𝑥0
𝑥1
𝑥∈ ⋮ 𝑥0 = 1, 𝑦 ∈ {0, 1}
𝑥𝑛

1
ℎ𝜃 𝑥 = −𝜃 ⊤𝑥
1+ 𝑒
Slide credit: Andrew Ng
Cost function for Linear Regression
𝑚 𝑚
1 𝑖 𝑖 2 1
𝐽 𝜃 = ෍ ℎ𝜃 𝑥 −𝑦 = ෍ Cost(ℎ𝜃 (𝑥 𝑖 ), 𝑦))
2𝑚 𝑚
𝑖=1 𝑖=1

1 2
Cost(ℎ𝜃 𝑥 , 𝑦) = ℎ𝜃 𝑥 − 𝑦
2

Slide credit: Andrew Ng

Cost function for Logistic Regression
−log ℎ𝜃 𝑥 if 𝑦 = 1
Cost(ℎ𝜃 𝑥 , 𝑦) = ቐ
−log 1 − ℎ𝜃 𝑥 if 𝑦 = 0

if 𝑦 = 1 if 𝑦 = 0

0 ℎ𝜃 𝑥 1 0 ℎ𝜃 𝑥 1
Slide credit: Andrew Ng
Logistic regression cost function
−log ℎ𝜃 𝑥 if 𝑦 = 1
• Cost(ℎ𝜃 𝑥 , 𝑦) = ቐ
−log 1 − ℎ𝜃 𝑥 if 𝑦 = 0

• Cost ℎ𝜃 𝑥 , 𝑦 = −𝑦 log h𝜃 x − (1 − y) log 1 − ℎ𝜃 𝑥

• If 𝑦 = 1: Cost ℎ𝜃 𝑥 , 𝑦 = −log ℎ𝜃 𝑥
• If 𝑦 = 0: Cost ℎ𝜃 𝑥 , 𝑦 = −log 1 − ℎ𝜃 𝑥
Slide credit: Andrew Ng
Logistic regression
𝑚
1
𝐽 𝜃 = ෍ Cost(ℎ𝜃 (𝑥 𝑖 ), 𝑦 (𝑖) ))
𝑚
𝑖=1
1
= − σ𝑚 𝑖=1 𝑦 (𝑖)
log ℎ𝜃 𝑥 (𝑖) + (1 − 𝑦 (𝑖) ) log 1 − ℎ𝜃 𝑥 (𝑖)
𝑚

Learning: fit parameter 𝜃 Prediction: given new 𝑥

1
min 𝐽(𝜃) Output ℎ𝜃 𝑥 = −𝜃⊤ 𝑥
𝜃 1+𝑒

Slide credit: Andrew Ng

Logistic Regression
• Hypothesis representation

• Cost function

• Logistic regression with gradient descent

• Regularization

• Multi-class classification
Gradient descent
𝐽 𝜃
𝑚
1
=− ෍ 𝑦 (𝑖) log ℎ𝜃 𝑥 (𝑖) + (1 − 𝑦 (𝑖) ) log 1 − ℎ𝜃 𝑥 (𝑖)
𝑚
𝑖=1
Goal: min 𝐽(𝜃)
𝜃 Good news: Convex function!
Bad news: No analytical solution
Repeat {
𝜕
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 𝐽(𝜃) (Simultaneously update all 𝜃𝑗 )
𝜕𝜃𝑗
} 𝑚
𝜕 1 𝑖 (𝑖) (𝑖)
𝐽 𝜃 = ෍(ℎ𝜃 𝑥 −𝑦 ) 𝑥𝑗
𝜕𝜃𝑗 𝑚
𝑖=1
Slide credit: Andrew Ng
Gradient descent
𝐽 𝜃
𝑚
1
=− ෍ 𝑦 (𝑖) log ℎ𝜃 𝑥 (𝑖) + (1 − 𝑦 (𝑖) ) log 1 − ℎ𝜃 𝑥 (𝑖)
𝑚
𝑖=1
Goal: min 𝐽(𝜃)
𝜃

Repeat {
𝑚
(Simultaneously update all 𝜃𝑗 )
1 𝑖 (𝑖)
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 ෍ ℎ𝜃 𝑥 − 𝑦 (𝑖) 𝑥𝑗
𝑚
𝑖=1
}

Slide credit: Andrew Ng

Gradient descent for Linear Regression
Repeat {
𝑚
1
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 ෍ ℎ𝜃 𝑥 𝑖
−𝑦 (𝑖) (𝑖)
𝑥𝑗 ℎ𝜃 𝑥 = 𝜃 ⊤ 𝑥
𝑚
𝑖=1
}
Gradient descent for Logistic Regression
Repeat {
1
𝑚
1
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 ෍ ℎ𝜃 𝑥 𝑖
−
(𝑖)
𝑦 (𝑖) 𝑥𝑗 ℎ𝜃 𝑥 = −𝜃 ⊤𝑥
𝑚
𝑖=1
1+ 𝑒
} Slide credit: Andrew Ng
Logistic Regression
• Hypothesis representation

• Cost function

• Logistic regression with gradient descent

• Regularization

• Multi-class classification
Logistic Regression
• Hypothesis representation

• Cost function

• Logistic regression with gradient descent

• Regularization

• Multi-class classification
Multi-class classification
• Email foldering/taggning: Work, Friends, Family, Hobby

• Medical diagrams: Not ill, Cold, Flu

• Weather: Sunny, Cloudy, Rain, Snow

Slide credit: Andrew Ng

Binary classification Multiclass classification

𝑥2 𝑥2

𝑥1 𝑥1
One-vs-all (one-vs-rest)
𝑥 2
1
ℎ𝜃 𝑥
𝑥1
𝑥2
2 𝑥2
ℎ𝜃 𝑥

𝑥1 𝑥1
Class 1:
Class 2: 3 𝑥2
Class 3: ℎ𝜃 𝑥

ℎ𝜃𝑖 𝑥 = 𝑃 𝑦 = 𝑖 𝑥; 𝜃 (𝑖 = 1, 2, 3) 𝑥1
Slide credit: Andrew Ng
One-vs-all
𝑖
• Train a logistic regression classifier
ℎ𝜃 𝑥 for
each class 𝑖 to predict the probability that 𝑦 = 𝑖

• Given a new input 𝑥, pick the class 𝑖 that

maximizes
𝑖
max ℎ𝜃 𝑥
i
Slide credit: Andrew Ng
Regularization
The problem of
overfitting
Example: Linear regression (housing prices)
Price

Price

Price
Size Size Size

Overfitting: If we have too many features, the learned hypothesis

may fit the training set very well ( ), but fail
to generalize to new examples (predict prices on new examples).
Andrew Ng
Addressing overfitting:

Options:
1. Reduce number of features.
― Manually select which features to keep.
― Model selection algorithm (later in course).
2. Regularization.
― Keep all the features, but reduce magnitude/values of
parameters .
― Works well when we have a lot of features, each of
which contributes a bit to predicting .

Andrew Ng
Regularization
Cost function
Intuition

Price
Price

Size of house Size of house

Suppose we penalize and make , really small.

Andrew Ng
Regularization.

Small values for parameters

― “Simpler” hypothesis
― Less prone to overfitting
Housing:
― Features:
― Parameters:

Andrew Ng
Regularization.

Price

Size of house

Andrew Ng
In regularized linear regression, we choose to minimize

What if is set to an extremely large value (perhaps for too large

for our problem, say )?
Price

Size of house

Andrew Ng
Regularization
Regularized linear
regression
Regularized linear regression
Gradient descent
Repeat

Andrew Ng
Normal equation

Andrew Ng
Non-invertibility (optional/advanced).
Suppose ,
(#examples) (#features)

If ,

Andrew Ng
References
 Andrew Ng’s slides on Multiple Linear Regression
from his Machine Learning Course on Coursera.

Andrew Ng
Disclaimer
 Content of this presentation is not original and it
has been prepared from various sources for
teaching purpose.

Andrew Ng

Non-Exact Differential Equation: Integrating Factors
80% (10)
Non-Exact Differential Equation: Integrating Factors
7 pages
Digital Signal and Image Processing
67% (3)
Digital Signal and Image Processing
268 pages
DFS Algorithm For Graph
No ratings yet
DFS Algorithm For Graph
4 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Logistic Regression: Classification
No ratings yet
Logistic Regression: Classification
28 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Sample Research Paper
No ratings yet
Sample Research Paper
26 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
Regression and Classification
No ratings yet
Regression and Classification
15 pages
Logistic Regression
No ratings yet
Logistic Regression
37 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Lecture Note #9 - PEC-CS701E
No ratings yet
Lecture Note #9 - PEC-CS701E
41 pages
Week 3 Lecture Notes
No ratings yet
Week 3 Lecture Notes
7 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
06 Logistic Regression PDF
No ratings yet
06 Logistic Regression PDF
10 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Lecture 3. Classification
No ratings yet
Lecture 3. Classification
60 pages
Slide 2
No ratings yet
Slide 2
30 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Linear Regression: Jia-Bin Huang Virginia Tech
No ratings yet
Linear Regression: Jia-Bin Huang Virginia Tech
59 pages
Introduction To Machine Learning: The Problem of Overfitting
No ratings yet
Introduction To Machine Learning: The Problem of Overfitting
8 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
Andrew NG
No ratings yet
Andrew NG
31 pages
Lecture LinearRegression
No ratings yet
Lecture LinearRegression
42 pages
CH 4
No ratings yet
CH 4
41 pages
Docs Slides Lecture6
No ratings yet
Docs Slides Lecture6
31 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
Binary Logistic Regression 2
No ratings yet
Binary Logistic Regression 2
43 pages
Lecture 21 - Logistic Regression
No ratings yet
Lecture 21 - Logistic Regression
34 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
Tmi05 2 Logistic Regression
No ratings yet
Tmi05 2 Logistic Regression
29 pages
Data Science L19 - LogisticRegression
No ratings yet
Data Science L19 - LogisticRegression
52 pages
Week 8
No ratings yet
Week 8
38 pages
Cost Function
No ratings yet
Cost Function
17 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
CSE445 T4a Logistic Regression
No ratings yet
CSE445 T4a Logistic Regression
38 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
07: Regularization: The Problem of Overfitting
No ratings yet
07: Regularization: The Problem of Overfitting
5 pages
Regulariza On: The Problem of Overfi6ng
No ratings yet
Regulariza On: The Problem of Overfi6ng
19 pages
Lecture 7
No ratings yet
Lecture 7
19 pages
Lecture 7
No ratings yet
Lecture 7
19 pages
Logistic Regression
No ratings yet
Logistic Regression
31 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Chap 5
No ratings yet
Chap 5
31 pages
Logistic Regression
No ratings yet
Logistic Regression
23 pages
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Porter Stemmer
No ratings yet
Porter Stemmer
14 pages
NLP Model
No ratings yet
NLP Model
25 pages
Text Proc
No ratings yet
Text Proc
55 pages
Sequence Learning Problems
No ratings yet
Sequence Learning Problems
10 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Selective Read, Selective Write, Selective Forget
No ratings yet
Selective Read, Selective Write, Selective Forget
27 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
37 pages
AI Beyond Classical Search &CSPs
No ratings yet
AI Beyond Classical Search &CSPs
116 pages
2022 Set A PYQ Paper - Digital Signal Processing PYQ Paper For Sem V Uploaded by Navdeep Raghav (DU Academic Corner)
No ratings yet
2022 Set A PYQ Paper - Digital Signal Processing PYQ Paper For Sem V Uploaded by Navdeep Raghav (DU Academic Corner)
4 pages
Ch06 - Image Compression
No ratings yet
Ch06 - Image Compression
42 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
200 pages
Accuracy, Recall, Precision, F-Score & Specificity, Which To Optimize On
No ratings yet
Accuracy, Recall, Precision, F-Score & Specificity, Which To Optimize On
10 pages
Optimizing Ore-Waste Dig-Limits As Part of Operational Mine Planning Through Genetic Algorithms
No ratings yet
Optimizing Ore-Waste Dig-Limits As Part of Operational Mine Planning Through Genetic Algorithms
13 pages
Regression Analysis Linear and Multiple Regression
No ratings yet
Regression Analysis Linear and Multiple Regression
6 pages
Open Elective Notice Jan May 2025
No ratings yet
Open Elective Notice Jan May 2025
2 pages
ECE533 Digital Image Processing: University of Wisconsin - Madison
No ratings yet
ECE533 Digital Image Processing: University of Wisconsin - Madison
25 pages
Automated Attendance System With Multi-Faces Using Convolution Neural Network (CNN)
No ratings yet
Automated Attendance System With Multi-Faces Using Convolution Neural Network (CNN)
6 pages
1.8 Classical Encryption Techniques-Substitution Techniques
No ratings yet
1.8 Classical Encryption Techniques-Substitution Techniques
33 pages
Algorithm Analysis and Design
No ratings yet
Algorithm Analysis and Design
83 pages
Support Vector Machine
No ratings yet
Support Vector Machine
21 pages
Design and Implementation of Sorting Algorithms Based On FPGA
No ratings yet
Design and Implementation of Sorting Algorithms Based On FPGA
4 pages
Pushdown Automata
No ratings yet
Pushdown Automata
11 pages
Synthesizing Visual Realities Design and Implementation of A Text To Image Synthesizer Leveraging Spatial Transformer Generative Adversarial Networks
No ratings yet
Synthesizing Visual Realities Design and Implementation of A Text To Image Synthesizer Leveraging Spatial Transformer Generative Adversarial Networks
5 pages
Cube It: Creating A 3D Rubik'S Cube Simulator in C++ and Opengl
No ratings yet
Cube It: Creating A 3D Rubik'S Cube Simulator in C++ and Opengl
1 page
Scholkopf Kernel PDF
No ratings yet
Scholkopf Kernel PDF
6 pages
ALife Conference 2024 PINCA
No ratings yet
ALife Conference 2024 PINCA
3 pages
PoC Healthcare
No ratings yet
PoC Healthcare
13 pages
IMOmath - Basic Methods For Solving Functional Equations
No ratings yet
IMOmath - Basic Methods For Solving Functional Equations
2 pages
A Flexible Univariate Autoregressive Time-Series Model For Dispersed Count Data
No ratings yet
A Flexible Univariate Autoregressive Time-Series Model For Dispersed Count Data
22 pages
Spatial Filtering
No ratings yet
Spatial Filtering
51 pages
Module 2
No ratings yet
Module 2
78 pages
Final Year Mathematics Syllabus
No ratings yet
Final Year Mathematics Syllabus
5 pages
Developed by Adnan Alam Khan: For BS Students
No ratings yet
Developed by Adnan Alam Khan: For BS Students
26 pages

3 Logistic Regression and Regularization

Uploaded by

3 Logistic Regression and Regularization

Uploaded by

Logistic Regression

• Logistic regression with gradient descent

• Logistic regression with gradient descent

• Threshold classifier output ℎ𝜃 𝑥 at 0.5

ℎ𝜃 𝑥 = 𝜃 ⊤ 𝑥 (from linear regression)

Logistic regression is actually for classification

• Tell patient that 70% chance of tumor being malignant

Slide credit: Andrew Ng

• Predict “𝑦 = 1” if −1 + 𝑥12 + 𝑥22 ≥ 0

• ℎ𝜃 𝑥 = 𝑔(𝜃0 + 𝜃1 𝑥1 + 𝜃2𝑥2 + 𝜃3 𝑥12 +

• Logistic regression with gradient descent

Slide credit: Andrew Ng

• Cost ℎ𝜃 𝑥 , 𝑦 = −𝑦 log h𝜃 x − (1 − y) log 1 − ℎ𝜃 𝑥

Learning: fit parameter 𝜃 Prediction: given new 𝑥

Slide credit: Andrew Ng

• Logistic regression with gradient descent

Slide credit: Andrew Ng

• Logistic regression with gradient descent

• Logistic regression with gradient descent

• Medical diagrams: Not ill, Cold, Flu

• Weather: Sunny, Cloudy, Rain, Snow

Slide credit: Andrew Ng

• Given a new input 𝑥, pick the class 𝑖 that

Overfitting: If we have too many features, the learned hypothesis

Size of house Size of house

Suppose we penalize and make , really small.

Small values for parameters

What if is set to an extremely large value (perhaps for too large

You might also like