3 Logistic Regression and Regularization
3 Logistic Regression and Regularization
Logistic Regression
• Hypothesis representation
• Cost function
• Regularization
• Multi-class classification
Logistic Regression
• Hypothesis representation
• Cost function
• Regularization
• Multi-class classification
1 (Yes)
Malignant?
0 (No)
Tumor Size
ℎ𝜃 𝑥 = 𝜃 ⊤ 𝑥
Logistic regression: 0 ≤ ℎ𝜃 𝑥 ≤ 1
1
where 𝑔 𝑧 =
1+𝑒 −𝑧 𝑔(𝑧)
• Sigmoid function
𝑧 Slide credit: Andrew Ng
Interpretation of hypothesis output
• ℎ𝜃 𝑥 = estimated probability that 𝑦 = 1 on input 𝑥
𝑥0 1
• Example: If 𝑥 = x =
1 tumorSize
• ℎ𝜃 𝑥 = 0.7
Age
E.g., 𝜃0 = −3, 𝜃1 = 1, 𝜃2 = 1
Tumor Size
• Predict “𝑦 = 1” if −3 + 𝑥1 + 𝑥2 ≥
0
Slide credit: Andrew Ng
• ℎ𝜃 𝑥 = 𝑔(𝜃0 + 𝜃1 𝑥1 + 𝜃2𝑥2
+ 𝜃3 𝑥12 + 𝜃4 𝑥22 )
E.g., 𝜃0 = −1, 𝜃1 = 0, 𝜃2 = 0, 𝜃3 = 1, 𝜃4 = 1
• Cost function
• Regularization
• Multi-class classification
Training set with 𝑚 examples
{ 𝑥 1 ,𝑦 1 , 𝑥 2 ,𝑦 2 ,⋯, 𝑥 𝑚
,𝑦 𝑚
𝑥0
𝑥1
𝑥∈ ⋮ 𝑥0 = 1, 𝑦 ∈ {0, 1}
𝑥𝑛
1
ℎ𝜃 𝑥 = −𝜃 ⊤𝑥
1+ 𝑒
Slide credit: Andrew Ng
Cost function for Linear Regression
𝑚 𝑚
1 𝑖 𝑖 2 1
𝐽 𝜃 = ℎ𝜃 𝑥 −𝑦 = Cost(ℎ𝜃 (𝑥 𝑖 ), 𝑦))
2𝑚 𝑚
𝑖=1 𝑖=1
1 2
Cost(ℎ𝜃 𝑥 , 𝑦) = ℎ𝜃 𝑥 − 𝑦
2
if 𝑦 = 1 if 𝑦 = 0
0 ℎ𝜃 𝑥 1 0 ℎ𝜃 𝑥 1
Slide credit: Andrew Ng
Logistic regression cost function
−log ℎ𝜃 𝑥 if 𝑦 = 1
• Cost(ℎ𝜃 𝑥 , 𝑦) = ቐ
−log 1 − ℎ𝜃 𝑥 if 𝑦 = 0
• If 𝑦 = 1: Cost ℎ𝜃 𝑥 , 𝑦 = −log ℎ𝜃 𝑥
• If 𝑦 = 0: Cost ℎ𝜃 𝑥 , 𝑦 = −log 1 − ℎ𝜃 𝑥
Slide credit: Andrew Ng
Logistic regression
𝑚
1
𝐽 𝜃 = Cost(ℎ𝜃 (𝑥 𝑖 ), 𝑦 (𝑖) ))
𝑚
𝑖=1
1
= − σ𝑚 𝑖=1 𝑦 (𝑖)
log ℎ𝜃 𝑥 (𝑖) + (1 − 𝑦 (𝑖) ) log 1 − ℎ𝜃 𝑥 (𝑖)
𝑚
• Cost function
• Regularization
• Multi-class classification
Gradient descent
𝐽 𝜃
𝑚
1
=− 𝑦 (𝑖) log ℎ𝜃 𝑥 (𝑖) + (1 − 𝑦 (𝑖) ) log 1 − ℎ𝜃 𝑥 (𝑖)
𝑚
𝑖=1
Goal: min 𝐽(𝜃)
𝜃 Good news: Convex function!
Bad news: No analytical solution
Repeat {
𝜕
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 𝐽(𝜃) (Simultaneously update all 𝜃𝑗 )
𝜕𝜃𝑗
} 𝑚
𝜕 1 𝑖 (𝑖) (𝑖)
𝐽 𝜃 = (ℎ𝜃 𝑥 −𝑦 ) 𝑥𝑗
𝜕𝜃𝑗 𝑚
𝑖=1
Slide credit: Andrew Ng
Gradient descent
𝐽 𝜃
𝑚
1
=− 𝑦 (𝑖) log ℎ𝜃 𝑥 (𝑖) + (1 − 𝑦 (𝑖) ) log 1 − ℎ𝜃 𝑥 (𝑖)
𝑚
𝑖=1
Goal: min 𝐽(𝜃)
𝜃
Repeat {
𝑚
(Simultaneously update all 𝜃𝑗 )
1 𝑖 (𝑖)
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 ℎ𝜃 𝑥 − 𝑦 (𝑖) 𝑥𝑗
𝑚
𝑖=1
}
• Cost function
• Regularization
• Multi-class classification
Logistic Regression
• Hypothesis representation
• Cost function
• Regularization
• Multi-class classification
Multi-class classification
• Email foldering/taggning: Work, Friends, Family, Hobby
𝑥2 𝑥2
𝑥1 𝑥1
One-vs-all (one-vs-rest)
𝑥 2
1
ℎ𝜃 𝑥
𝑥1
𝑥2
2 𝑥2
ℎ𝜃 𝑥
𝑥1 𝑥1
Class 1:
Class 2: 3 𝑥2
Class 3: ℎ𝜃 𝑥
ℎ𝜃𝑖 𝑥 = 𝑃 𝑦 = 𝑖 𝑥; 𝜃 (𝑖 = 1, 2, 3) 𝑥1
Slide credit: Andrew Ng
One-vs-all
𝑖
• Train a logistic regression classifier
ℎ𝜃 𝑥 for
each class 𝑖 to predict the probability that 𝑦 = 𝑖
Price
Price
Size Size Size
Options:
1. Reduce number of features.
― Manually select which features to keep.
― Model selection algorithm (later in course).
2. Regularization.
― Keep all the features, but reduce magnitude/values of
parameters .
― Works well when we have a lot of features, each of
which contributes a bit to predicting .
Andrew Ng
Regularization
Cost function
Intuition
Price
Price
Andrew Ng
Regularization.
Andrew Ng
Regularization.
Price
Size of house
Andrew Ng
In regularized linear regression, we choose to minimize
Size of house
Andrew Ng
Regularization
Regularized linear
regression
Regularized linear regression
Gradient descent
Repeat
Andrew Ng
Normal equation
Andrew Ng
Non-invertibility (optional/advanced).
Suppose ,
(#examples) (#features)
If ,
Andrew Ng
References
Andrew Ng’s slides on Multiple Linear Regression
from his Machine Learning Course on Coursera.
Andrew Ng
Disclaimer
Content of this presentation is not original and it
has been prepared from various sources for
teaching purpose.
Andrew Ng