0% found this document useful (0 votes)
2 views

LogisticRegression_ExercisesSolutions

The document contains exercises and solutions related to logistic regression, covering topics such as the sigmoid function, decision boundaries, cost functions, gradient descent, model evaluation, and regularization. Each exercise includes mathematical expressions, calculations, and explanations to illustrate key concepts in logistic regression. The document serves as a comprehensive guide for understanding and applying logistic regression techniques.

Uploaded by

qpd4khf7yq
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

LogisticRegression_ExercisesSolutions

The document contains exercises and solutions related to logistic regression, covering topics such as the sigmoid function, decision boundaries, cost functions, gradient descent, model evaluation, and regularization. Each exercise includes mathematical expressions, calculations, and explanations to illustrate key concepts in logistic regression. The document serves as a comprehensive guide for understanding and applying logistic regression techniques.

Uploaded by

qpd4khf7yq
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Logistic Regression: Exercises and Solutions

Prof. Abdelatif Hafid


ESISA
March 21, 2025

Exercise 1: Sigmoid Function


(a) Write the mathematical expression for the sigmoid function.
1
(b) Calculate g(z) when z = 2.5.
(c) What are the limits of g(z) as z approaches ±∞?
(d) Why is the sigmoid function useful for binary classification?

Solution:
(a) The sigmoid function is:
1
g(z) =
1 + e−z
(b) For z = 2.5:
1
g(2.5) =
1 + e−2.5
1
=
1 + 0.0821
= 0.924

(c) Limits:
• As z → +∞: g(z) → 1
• As z → -∞: g(z) → 0
(d) The sigmoid function is ideal for binary classification because:
• It maps any real input to (0,1)
• The output can be interpreted as a probability
• Has a natural decision boundary at 0.5

Exercise 2: Decision Boundary


For a logistic regression model with parameters represented as vectors:
   
w1 2
w= = , b = −3
w2 −1

Answer the following:


1 Sometimes denoted as σ(z).

1
 
x1
(a) Express z in terms of x = using vector notation.
x2

(b) Find the decision boundary equation.


(c) Determine where the model predicts y = 1.

Solution:
(a) Linear combination in vector form:
 

  x1
z =w x+b= 2 −1 −3
x2

Expanding:
z = 2x1 − x2 − 3

(b) Decision boundary:


g(z) = 0.5 =⇒ z = 0
Substituting z = 0:
2x1 − x2 − 3 = 0 =⇒ x2 = 2x1 − 3

(c) Predicting y = 1 (z > 0):


2x1 − x2 − 3 > 0 =⇒ x2 < 2x1 − 3

Exercise 3: Cost Function


2
For a single training example where y = 1, and h(x) represents the predicted value:
(a) Calculate the loss when ŷ = 0.7

(b) Calculate the loss when ŷ = 0.1


(c) Explain why mean squared error isn’t used
Solution:

(a) For ŷ = 0.7:

loss(ŷ, y) = −y log(ŷ) − (1 − y) log(1 − ŷ)


= −(1) log(0.7) − (0) log(0.3)
= − log(0.7)
= 0.357

(b) For ŷ = 0.1:

J(ŷ, y) = − log(0.1)
= 2.303

(c) Mean squared error isn’t used because:


• It creates a non-convex optimization problem
• Multiple local minima make optimization unreliable
• Gradient descent may not find the global minimum
2 Here, h(x(i) ) represents the predicted value and is also denoted as fw,b (x(i) ) or ŷ (i) .

2
Exercise 4: Gradient Descent
Given the dataset:
x1 x2 y
2 1 1
3 -1 0
1 2 1
4 0 0

(a) Write gradient descent update equations


(b) Calculate the first iteration (α = 0.1, starting from zeros)
(c) Describe the role of α
(d) Discuss potential issues with large α

Solution:
(a) Gradient descent update equations:
m
∂J 1 X (i) (i)
= (ŷ − y (i) )xj ,
∂wj m i=1
m
∂J 1 X (i)
= (ŷ − y (i) )
∂b m i=1

where ŷ = g(z) and z = w1 x1 + w2 x2 + b.


(b) First iteration (α = 0.1, starting from w1 := 0, w2 := 0, b := 0):

ŷ (1) = g(0) = 0.5, ŷ (2) = g(0) = 0.5,


ŷ (3) = g(0) = 0.5, ŷ (4) = g(0) = 0.5.

For w1 :
4
1 X (i) (i)
w1 := w1 − α · (ŷ − y (i) )x1
4 i=1
1 
= 0 − 0.1 · (0.5 − 1)2 + (0.5 − 0)3 + (0.5 − 1)1 + (0.5 − 0)4
4
1 
= 0 − 0.1 · − 1 + 1.5 − 0.5 + 2
4
= 0 − 0.1 · 0.5
= −0.05.

For w2 :
4
1 X (i) (i)
w2 := w2 − α · (ŷ − y (i) )x2
4 i=1
1 
= 0 − 0.1 · (0.5 − 1)1 + (0.5 − 0)(−1) + (0.5 − 1)2 + (0.5 − 0)0
4
1 
= 0 − 0.1 · − 0.5 + 0 + (−1) + 0
4
= 0 − 0.1 · (−1.5)
= 0.15.

3
For b:
4
1 X (i)
b := b − α · (ŷ − y (i) )
4 i=1
1 
= 0 − 0.1 · (0.5 − 1) + (0.5 − 0) + (0.5 − 1) + (0.5 − 0)
4
1 
= 0 − 0.1 · − 0.5 + 0.5 − 0.5 + 0.5
4
= 0 − 0.1 · 0
= 0.

(c) The role of α (learning rate):


• Controls the step size in the gradient descent updates.
• Determines how quickly the model converges to the minimum of the cost function.
• A small α results in slow convergence, while a large α can cause divergence or overshoot.
(d) Potential issues with large α:
• Updates may overshoot the optimal solution.
• The parameters may oscillate around the minimum.
• Gradient descent might diverge, failing to converge to the minimum.

Exercise 5: Model Evaluation


Given the following confusion matrix:
Predicted 0 Predicted 1
Actual 0 45 5
Actual 1 10 40
Calculate:
(a) Accuracy, Precision, Recall, and F1 Score
(b) The most important metric for scenarios with costly false positives
Solution:
(a) Accuracy:
TP + TN 45 + 40 85
Accuracy = = = = 0.85 (85%).
Total 45 + 5 + 10 + 40 100
(b) Precision:
TP 40 40
Precision = = = ≈ 0.89 (89%).
TP + FP 40 + 5 45
(c) Recall:
TP 40 40
Recall = = = = 0.80 (80%).
TP + FN 40 + 10 50
(d) F1 Score:
Precision · Recall 0.89 · 0.80
F1 = 2 · =2· .
Precision + Recall 0.89 + 0.80
Simplifying:
0.712 1.424
F1 = 2 · ≈ ≈ 0.84 (84%).
1.69 1.69

4
(e) Most important metric for costly false positives:
In scenarios where false positives are costly, Precision is the most critical metric. Precision ensures
that when the model predicts a positive outcome, it is highly likely to be correct, minimizing the
impact of false positives.

Exercise 6: Regularization
(a) Write the L2 regularized cost function
(b) Explain λ’s effect on overfitting

(c) Calculate regularization impact with λ = 1.5, w1 = 0.8

Solution:

(a) L2 regularized cost function:


m
1 Xh i Pn
J(w, b) = − y (i) log(h(x(i) )) − (1 − y (i) ) log(1 − h(x(i) )) + λ
2m j=1 wj2
m i=1

(b) Effects of λ:
• Large λ: Stronger regularization, simpler model
• Small λ: Weaker regularization, more complex model
• λ = 0: No regularization (original model)
(c) For λ = 1.5 , w1 = 0.8:
• Regular gradient term: ∂J
∂w1

• Regularization term: λ
m w1 = m
1.5
· 0.8
• Updated gradient: ∂J 1.5
∂w1 + m w1

You might also like