LogisticRegression_ExercisesSolutions
LogisticRegression_ExercisesSolutions
Solution:
(a) The sigmoid function is:
1
g(z) =
1 + e−z
(b) For z = 2.5:
1
g(2.5) =
1 + e−2.5
1
=
1 + 0.0821
= 0.924
(c) Limits:
• As z → +∞: g(z) → 1
• As z → -∞: g(z) → 0
(d) The sigmoid function is ideal for binary classification because:
• It maps any real input to (0,1)
• The output can be interpreted as a probability
• Has a natural decision boundary at 0.5
1
x1
(a) Express z in terms of x = using vector notation.
x2
Solution:
(a) Linear combination in vector form:
⊤
x1
z =w x+b= 2 −1 −3
x2
Expanding:
z = 2x1 − x2 − 3
J(ŷ, y) = − log(0.1)
= 2.303
2
Exercise 4: Gradient Descent
Given the dataset:
x1 x2 y
2 1 1
3 -1 0
1 2 1
4 0 0
Solution:
(a) Gradient descent update equations:
m
∂J 1 X (i) (i)
= (ŷ − y (i) )xj ,
∂wj m i=1
m
∂J 1 X (i)
= (ŷ − y (i) )
∂b m i=1
For w1 :
4
1 X (i) (i)
w1 := w1 − α · (ŷ − y (i) )x1
4 i=1
1
= 0 − 0.1 · (0.5 − 1)2 + (0.5 − 0)3 + (0.5 − 1)1 + (0.5 − 0)4
4
1
= 0 − 0.1 · − 1 + 1.5 − 0.5 + 2
4
= 0 − 0.1 · 0.5
= −0.05.
For w2 :
4
1 X (i) (i)
w2 := w2 − α · (ŷ − y (i) )x2
4 i=1
1
= 0 − 0.1 · (0.5 − 1)1 + (0.5 − 0)(−1) + (0.5 − 1)2 + (0.5 − 0)0
4
1
= 0 − 0.1 · − 0.5 + 0 + (−1) + 0
4
= 0 − 0.1 · (−1.5)
= 0.15.
3
For b:
4
1 X (i)
b := b − α · (ŷ − y (i) )
4 i=1
1
= 0 − 0.1 · (0.5 − 1) + (0.5 − 0) + (0.5 − 1) + (0.5 − 0)
4
1
= 0 − 0.1 · − 0.5 + 0.5 − 0.5 + 0.5
4
= 0 − 0.1 · 0
= 0.
4
(e) Most important metric for costly false positives:
In scenarios where false positives are costly, Precision is the most critical metric. Precision ensures
that when the model predicts a positive outcome, it is highly likely to be correct, minimizing the
impact of false positives.
Exercise 6: Regularization
(a) Write the L2 regularized cost function
(b) Explain λ’s effect on overfitting
Solution:
(b) Effects of λ:
• Large λ: Stronger regularization, simpler model
• Small λ: Weaker regularization, more complex model
• λ = 0: No regularization (original model)
(c) For λ = 1.5 , w1 = 0.8:
• Regular gradient term: ∂J
∂w1
• Regularization term: λ
m w1 = m
1.5
· 0.8
• Updated gradient: ∂J 1.5
∂w1 + m w1