0% found this document useful (0 votes)
6 views17 pages

Lec18 Logistic Regression

The document compares Naïve Bayes and Perceptron, highlighting that Naïve Bayes predicts classes based on probabilities while Perceptron assigns classes based on the sign of a linear combination of inputs. It also discusses Logistic Regression, which uses a probabilistic approach to classify data and emphasizes the importance of the cost function and maximum likelihood estimation in training the model. Additionally, it covers regularized logistic regression to prevent overfitting by adding a penalty term to the cost function.

Uploaded by

Sabalpara Jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views17 pages

Lec18 Logistic Regression

The document compares Naïve Bayes and Perceptron, highlighting that Naïve Bayes predicts classes based on probabilities while Perceptron assigns classes based on the sign of a linear combination of inputs. It also discusses Logistic Regression, which uses a probabilistic approach to classify data and emphasizes the importance of the cost function and maximum likelihood estimation in training the model. Additionally, it covers regularized logistic regression to prevent overfitting by adding a penalty term to the cost function.

Uploaded by

Sabalpara Jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Naïve Bayes vs Perceptron

• Naïve Bayes predict classes based on the probability of the instance


being that class
• It learns 𝑃 𝑌 = 𝑦𝑘 𝑋 = 𝒙𝑖
• Perceptron doesn’t produce probability estimate
• It estimates 𝜽 from the training data
• For new data find the sign of 𝜽𝑇 𝒙 𝑖
• Based on the sign assign a class
Logistic Regression
Logistic Regression
• It takes a probabilistic approach to learn a classifier (a function)
• ℎ𝜽 𝒙 should give 𝑝 𝑦 = 1|𝒙; 𝜽
• We want 0 ≤ ℎ𝜽 𝒙 ≤ 1
• Logistic regression model :
ℎ𝜽 𝒙 = 𝑔(𝜽𝑇 𝒙)

1
𝑔 𝑧 =
1 + 𝑒 −𝑧
𝜽𝑇𝒙
1 𝑒
ℎ𝜽 𝒙 = 𝑇 = 𝑇
1+ 𝑒 −𝜽 𝒙
1 + 𝑒𝜽 𝒙
• The sigmoid first computes real-valued score and then squashes it between
0 and 1 to make it as a probability score.
Interpreting Hypothesis Output
𝜽𝑇𝒙
1 𝑒
• ℎ𝜽 𝒙 = estimated 𝑝 𝑦 = 1|𝒙; 𝜽 = 𝑇 = 𝑇
1+𝑒 −𝜽 𝒙 1+𝑒 𝜽 𝒙
• Note: 𝑝 𝑦 = 0|𝒙; 𝜽 + 𝑝 𝑦 = 1|𝒙; 𝜽 = 1
1 1
• So, 𝑝 𝑦 = 0|𝒙; 𝜽 = 1 − 𝑝 𝑦 = 1|𝒙; 𝜽 = 1 − 𝑇 = 𝑇
1+𝑒 −𝜽 𝒙 1+𝑒 𝜽 𝒙
• The log-odds (logits) of the model
𝑝 𝑦 = 1|𝒙; 𝜽 𝜽𝑇 𝒙
log = log 𝑒 = 𝜽𝑇 𝒙
𝑝 𝑦 = 0|𝒙; 𝜽

• Thus if 𝜽𝑇 𝒙 > 𝟎 then the positive class more probable


Logistic Regression
• ℎ𝜽 𝒙 = 𝑔 𝜽𝑇 𝒙
1
•𝑔 𝑧 =
1+𝑒 −𝑧
• 𝜽𝑇 𝒙 should be large negative values for negative instances
• 𝜽𝑇 𝒙 should be large positive values for positive instances

• Assume a threshold and predict


• 𝑦 = 1 if ℎ𝜽 𝒙 ≥ 0.5 (𝜽𝑇 𝒙 ≥ 0)
• 𝑦 = 0 if ℎ𝜽 𝒙 < 0.5 (𝜽𝑇 𝒙 < 0)
Non-linear Decision Boundary
• Can apply basis function expansion to features

1
𝑥1
𝑥2
1 𝑥1 𝑥2
2
•𝒙= 1 →
𝑥 𝑥 1
𝑥2 𝑥22
𝑥12 𝑥2
𝑥1 𝑥22

Logistic Regression Cost Function
• Should not use the squared loss as in case of linear regression
𝑛
1 (𝑖) (𝑖) 2
𝐽 𝜽 = ෍ ℎ𝜽 𝒙 −𝑦
2𝑛
𝑖=1
• The logistic regression model will lead to a non-convex cost function
1
ℎ𝜽 𝒙 = −𝜽 𝑇𝒙
1+𝑒
Finding the Cost Function via MLE
• Likelihood of the data is given by 𝐿 𝜽 = ∏𝑛𝑖=1 𝑝(𝑦 𝑖 |𝒙 𝑖 ; 𝜽)
• 𝜽 that maximizes the likelihood
𝑛

𝜽𝑀𝐿𝐸 = arg max 𝐿 𝜽 = arg max ෑ 𝑝(𝑦 𝑖 |𝒙 𝑖 ; 𝜽)


𝜽 𝜽
𝑖=1 𝑛

𝜽𝑀𝐿𝐸 = arg max log 𝐿 𝜽 = arg max log ෑ 𝑝(𝑦 𝑖 |𝒙 𝑖 ; 𝜽)


𝜽 𝜽
𝑖=1
= arg max σ𝑛𝑖=1 log 𝑝(𝑦 𝑖 |𝒙 𝑖 ; 𝜽)
𝜽
Finding the Cost Function via MLE
• Each label 𝑦 𝑖 is binary with probability ℎ𝜽 𝒙(𝑖)
• Assume Bernoulli likelihood
𝑛

𝑝 𝒚|𝑿, 𝜽 = ෑ 𝑝(𝑦 𝑖 |𝒙 𝑖 ; 𝜽)
𝑖=1
𝑦𝑖 1−𝑦 𝑖
= ∏𝑛𝑖=1 ℎ𝜽 𝒙 𝑖
1 − ℎ𝜽 𝒙(𝑖)
• The log-likelihood
𝑛

𝑙 𝜽 = ෍ 𝑦 𝑖 log ℎ𝜽 𝒙(𝑖) + 1 − 𝑦 𝑖
log 1 − ℎ𝜽 𝒙(𝑖)
𝑖=1
The Cost Function
• Maximizing 𝑙 𝜽 is equivalent to minimizing the NLL
𝑛

𝐽 𝜽 = 𝑁𝐿𝐿 𝜽 = − ෍ 𝑦 𝑖 log ℎ𝜽 𝒙(𝑖) + 1 − 𝑦 𝑖


log 1 − ℎ𝜽 𝒙(𝑖)
𝑖=1
• Cost of a single instance
−log ℎ𝜽 𝒙 𝑖𝑓 𝑦 = 1
𝑐𝑜𝑠𝑡 ℎ𝜽 𝒙 , 𝑦 = ൝
−log 1 − ℎ𝜽 𝒙 𝑖𝑓 𝑦 = 0
• The objective function
𝐽 𝜽 = σ𝑛𝑖=1 𝑐𝑜𝑠𝑡 ℎ𝜽 𝒙(𝑖) , 𝑦 𝑖
Intuition
−log ℎ𝜽 𝒙 𝑖𝑓 𝑦 = 1
• 𝑐𝑜𝑠𝑡 ℎ𝜽 𝒙 , 𝑦 = ൝
−log 1 − ℎ𝜽 𝒙 𝑖𝑓 𝑦 = 0
• If 𝑦 = 1
• 𝑐𝑜𝑠𝑡 = 0 for correct prediction
• As ℎ𝜽 𝒙 → 0, 𝑐𝑜𝑠𝑡 → ∞
• Mistakes should get large penalties
• e.g., predict ℎ𝜽 𝒙 = 0, but 𝑦 = 1
Intuition
−log ℎ𝜽 𝒙 𝑖𝑓 𝑦 = 1
• 𝑐𝑜𝑠𝑡 ℎ𝜽 𝒙 , 𝑦 = ൝
−log 1 − ℎ𝜽 𝒙 𝑖𝑓 𝑦 = 0
• If 𝑦 = 0
• 𝑐𝑜𝑠𝑡 = 0 for correct prediction
• As (1 − ℎ𝜽 𝒙 ) → 0, 𝑐𝑜𝑠𝑡 → ∞
• Mistakes should get large penalties
• e.g., predict ℎ𝜽 𝒙 = 0, but 𝑦 = 1
MAP formulation
Regularized Logistic Regression
• 𝐽 𝜽 = − σ𝑛𝑖=1 𝑦 𝑖 log ℎ𝜽 𝒙(𝑖) + 1 − 𝑦 𝑖
log 1 − ℎ𝜽 𝒙(𝑖)
• We can regularize the logistic regression as
𝑑

𝐽𝑟𝑒𝑔 𝜽 = 𝐽 𝜽 + 𝜆 ෍ 𝜃𝑗2
𝑗=1
2
=𝐽 𝜽 +𝜆 𝜽 2
𝐽𝑟𝑒𝑔 𝜽
𝑛 𝑑

= − ෍ 𝑦 𝑖 log ℎ𝜽 𝒙(𝑖) + 1 − 𝑦 𝑖
log 1 − ℎ𝜽 𝒙(𝑖) + 𝜆 ෍ 𝜃𝑗2
𝑖=1 𝑗=1
Estimating the Parameter

You might also like