Logistic Regression
Logistic Regression
Machine Learning
Lecture # 6
Spring 2024
1
Tentative Course Topics
2
Agenda
Classification Problem:
Logistic Regression
Regularization
Regularization for Linear Regression
Regularization
Regularization for Logistic Regression
3
Classification Problem:
Logistic Regression
4
Classification
- Discrete outcomes.
- Binary , 0 negative , 1 positive
(normal / abnormal)
- Multi-class: a telescope that identifies
whether an object in the night sky is a
galaxy, star, or planet.
5
Hypothesis Representation
Hypothesis Representation
• Logistic regression model
Sigmoid Function
Logistic Function
𝑔 𝜃 𝑥
0.5
𝜃 𝑥
• Our probability that our prediction is is just the complement of our probability that it is 1
For example,
if probability that it is 1 is , then
the probability that it is is .
8
Interpretation of hypothesis Output
• 𝜃 will give us the probability that our output is 1.
For example,
= , ( )=0.7…
tumourSize
• = 1-
Probability that , given , parameterized by
9
Binary logistic regression
• We have a set of feature vectors X with corresponding binary outputs
• By definition
• We want to transform the probability to remove the range restrictions, so can
take any real value.
10
Using ODDS
• We have a set of feature vectors X with corresponding binary outputs
• By definition
• We want to transform the probability to remove the range restrictions, as can
take any real value.
11
Hypothesis function (proof)
• We have a set of feature vectors X with corresponding binary outputs
• By definition
• We want to transform the probability to remove the range restrictions, as can
take any real value.
12
Hypothesis function
14
Maximum Likelihood Estimation (MLE)
15
Gradient Descent for Logistic Regression
16
Multiclass Classification (one-vs-all)
𝟏
𝒉𝜽 (𝒙)
(𝒊)
𝜽
24
Addressing overfitting
• There are two main options to address the issue of overfitting:
1) Reduce the number of features:
– Manually select which features to keep.
– Use a model selection algorithm.
2) Regularization
– Keep all the features, but reduce the magnitude of parameters 𝜃 .
– Regularization works well when we have a lot of slightly useful features.
25
Regularization (To avoid overfitting)
Regularization for Linear Regression
26
Regularization Intuition
Price
Price
It determines how much the costs of our theta parameters are inflated.
• Example: we have 2 sets of parameters =[1.35 3.5] and =[45.2 75.6]
– If λ is chosen to be 0, the cost function act as usual with no penalty on
choose the large ones =[45.2 75.6]
– If λ is chosen to be large, small values of are chosen instead of large one.
choose the large ones =[1.35 3.5]
28
Regularization for linear Regression
𝟏 𝒎 (𝒊) (𝒊) 𝟐 𝒏 𝟐
𝟐𝒎 𝒊 𝟏 𝜽 𝒋 𝟏 𝒋
• What if λ is set to an extremely large value (perhaps too large for our problem,
say λ= 1010)?
it may smooth out the function too much and cause underfitting. Why?
“underfit” , ,
Price
𝜽 𝟎
Size of house
X X X X
29
Regularization for linear Regression
• Gradient descent
<1
Regularization for Logistic Regression
x2
x1
Cost function:
𝒎 𝒏
(𝒊) (𝒊) (𝒊) (𝒊) 𝟐
𝜽 𝜽 𝒋
𝒊 𝟏 𝒋 𝟏
Small values of parameters