0% found this document useful (0 votes)
14 views

Logistic Regression_ Gradient Descent_ Example

The document describes a step-by-step process for updating weights and bias in a binary classification model using a dataset with four samples. It includes calculations for the forward pass, binary cross-entropy cost, gradients, and updates for weights and bias across multiple samples. After one epoch, the final updated parameters are a weight of 0.0077 and a bias of 0.0561.

Uploaded by

manchestermilf1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Logistic Regression_ Gradient Descent_ Example

The document describes a step-by-step process for updating weights and bias in a binary classification model using a dataset with four samples. It includes calculations for the forward pass, binary cross-entropy cost, gradients, and updates for weights and bias across multiple samples. After one epoch, the final updated parameters are a weight of 0.0077 and a bias of 0.0561.

Uploaded by

manchestermilf1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Example (1)

Assume
• We have ane data point, with feature 𝒙 = 𝟎. 𝟓𝐱
• Target label 𝒚 = 𝟏.
• Initial weights 𝒘 = 𝟎. 𝟐
• Initial bias 𝒃 = 𝟎. 𝟏.
• Learning rate 𝜶 = 𝟎. 𝟏.

---------------------------------------------------------------------------------------------------------------------
Step 1: Forward Pass
1 Calculate the linear combination =
𝑧 = 𝑤 ⋅ 𝑥 + 𝑏 = 0.2 ⋅ 0.5 + 0.1 = 0.2
2 Apply the sigmoid function 𝜎(𝑧) to get the prediction 𝑗ˆ.
1 1
𝑦ˆ = 𝜎(𝑧) = −𝑧
= ≈ 0.5498
1+𝑒 1 + 𝑒 −0.2

Step 2: Compute the Cost (Binary Cross-Entropy)


The Binary Cross-Entropy ( 𝐵𝐶 ) cost function for one data point is:
BCE = −(𝑦 ⋅ log⁡(𝑦) + (1 − 𝑦) ⋅ log⁡(1 − 𝑦ˆ))
Plugging in 𝑦 = 1 and 𝑦ˆ ≈ 0.508 :
BCE ≈ −(1 ⋅ log⁡(0.5498) + (1 − 1) ⋅ log⁡(1 − 0.5498)) ≈ −log⁡(0.5498) ≈ 0.5981

Step 3: Compute Gradients


To update the weights, we need the gradients of the BCE cost with respect to 𝑤 and 𝑏.
1 Gradient with respect to ul:
∂BCE
= (𝑦ˆ − 𝑦) − 𝑥 = (0.5498 − 1) ⋅ 0.5 = −0.2251
∂ℏ𝑤
2 Gradient with respect to 𝑘 :
∂BCE
= 𝑦ˆ − 𝑦 = 0.5498 − 1 = −0.4502
∂𝑏

Step 4: Update Weights and Bias


Using the learning rate 𝛼 = 0.1, we update 𝑤 and 𝑏 as follows:
1 Update u:
∂BCE
𝑤 =𝑤−𝛼− = 0.2 − 0.1 ⋅ (−0.2251) = 0.2 + 0.0225 = 0.2225
∂𝜗𝑤
2 Update b:
∂BCE
𝑏 =𝑏−𝛼⋅ = 0.1 − 0.1 ⋅ (−0.4502) = 0.1 + 0.0450 = 0.1450
∂𝑏
Summary of Updated Parameters
After one iteration, the updated weights and bias are:
• 𝑢 = 0.2225
• 𝑏 = 0.1450
Example (2)

The dataset with four samples:


Sample 𝑥 𝑦
1 0.5 1
2 1.5 0
3 2.0 1
4 3.0 0

Initial Conditions:
• Initial weight 𝑤 = 0.2
• Initial bias 𝑏 = 0.1
• Learning rate 𝛼 = 0.1
Goal:
We'll update the weights for each sample and go through one epoch of training.
---------------------------------------------------------------------------------------------------------------------

Step 1: Forward Pass, Prediction, and Cost Calculation


For each sample, we'll calculate the prediction 𝑦ˆ and the Binary Cross-Entropy cost.
Sample 1:
1 Calculate the linear combination 𝑧 :
𝑧 = 𝑤 ⋅ 𝑥 + 𝑏 = 0.2 ⋅ 0.5 + 0.1 = 0.2
2 Apply the sigmoid function to get 𝑦ˆ :
1
𝑦ˆ = 𝜎(𝑧) = ≈ 0.5498
1 + 𝑒 −0.2
3 Compute the BCE Cost:
BCE = −(𝑦 ⋅ log⁡(𝑦ˆ) + (1 − 𝑦) ⋅ log⁡(1 − 𝑦ˆ))
With 𝑦 = 1 and 𝑦ˆ ≈ 0.5498 :
BCE ≈ −log⁡(0.5498) ≈ 0.5981
Sample 2:
1 Calculate 𝑧 :
𝑧 = 𝑤 ⋅ 𝑥 + 𝑏 = 0.2 ⋅ 1.5 + 0.1 = 0.4
2 Apply the sigmoid to get 𝑦ˆ :
1
𝑦ˆ = 𝜎(𝑧) = ≈ 0.5987
1 + 𝑒 −0.4
3 Compute the BCE Cost: With 𝑦 = 0 and 𝑦ˆ ≈ 0.5987 :
BCE ≈ −log⁡(1 − 0.5987) ≈ 0.9130

Sample 3:
1 Calculate 𝑧 :
𝑧 = 𝑤 ⋅ 𝑥 + 𝑏 = 0.2 ⋅ 2.0 + 0.1 = 0.5
2 Apply the sigmoid to get 𝑦ˆ :
1
𝑦ˆ = 𝜎(𝑧) = ≈ 0.6225
1 + 𝑒 −0.5
3 Compute the BCE Cost: With 𝑦 = 1 and 𝑦ˆ ≈ 0.6225 :
BCE ≈ −log⁡(0.6225) ≈ 0.4741
Sample 4:
1 Calculate 𝑧 :
𝑧 = 𝑤 ⋅ 𝑥 + 𝑏 = 0.2 ⋅ 3.0 + 0.1 = 0.7
2 Apply the sigmoid to get 𝑦ˆ :
1
𝑦ˆ = 𝜎(𝑧) = ≈ 0.6682
1 + 𝑒 −0.7
3 Compute the BCE Cost: With 𝑦 = 0 and 𝑦ˆ ≈ 0.6682 :
BCE ≈ −log⁡(1 − 0.6682) ≈ 1.1015

Step 2: Compute Gradients for Each Sample


Now we'll compute the gradients of the BCE cost with respect to 𝑤 and 𝑏 for each sample.
Sample 1:
1 Gradient with respect to 𝑤 :
∂BCE
= (𝑦ˆ − 𝑦) ⋅ 𝑥 = (0.5498 − 1) ⋅ 0.5 = −0.2251
∂𝑤
2 Gradient with respect to 𝑏 :
∂BCE
= 𝑦ˆ − 𝑦 = 0.5498 − 1 = −0.4502
∂𝑏
Sample 2:
1 Gradient with respect to 𝑤 :
∂BCE
= (𝑦ˆ − 𝑦) ⋅ 𝑥 = (0.5987 − 0) ⋅ 1.5 = 0.8980
∂𝑤
2 Gradient with respect to 𝑏 :
∂BCE
= 𝑦ˆ − 𝑦 = 0.5987 − 0 = 0.5987
∂𝑏
Sample 3:
1 Gradient with respect to 𝑤 :
∂BCE
= (𝑦ˆ − 𝑦) ⋅ 𝑥 = (0.6225 − 1) ⋅ 2.0 = −0.755
∂𝑤
2 Gradient with respect to 𝑏 :
∂BCE
= 𝑦ˆ − 𝑦 = 0.6225 − 1 = −0.3775
∂𝑏
Sample 4:
1 Gradient with respect to 𝑤 :
∂BCE
= (𝑦ˆ − 𝑦) ⋅ 𝑥 = (0.6682 − 0) ⋅ 3.0 = 2.0046
∂𝑤
2 Gradient with respect to 𝑏 :
∂BCE
= 𝑦ˆ − 𝑦 = 0.6682 − 0 = 0.6682
∂𝑏

Step 3: Update Weights and Bias


Using the gradients and learning rate, we update 𝑤 and 𝑏 for each sample.
After Sample 1 Update:
1 Update 𝑤 :
∂BCE
𝑤 =𝑤−𝛼⋅ = 0.2 − 0.1 ⋅ (−0.2251) = 0.2 + 0.0225 = 0.2225
∂𝑤
2 Update 𝑏 :
∂BCE
𝑏 =𝑏−𝛼⋅ = 0.1 − 0.1 ⋅ (−0.4502) = 0.1 + 0.0450 = 0.1450
∂𝑏
After Sample 2 Update:
1 Update 𝑤 :
𝑤 = 0.2225 − 0.1 ⋅ 0.8980 = 0.2225 − 0.0898 = 0.1327
2 Update 𝑏 :
𝑏 = 0.1450 − 0.1 ⋅ 0.5987 = 0.1450 − 0.0599 = 0.0851

After Sample 3 Update:


1 Update 𝑤 :
𝑤 = 0.1327 − 0.1 ⋅ (−0.755) = 0.1327 + 0.0755 = 0.2082
2 Update 𝑏 :
𝑏 = 0.0851 − 0.1 ⋅ (−0.3775) = 0.0851 + 0.03775 = 0.1229
After Sample 4 Update:
1 Update 𝑤 :
𝑤 = 0.2082 − 0.1 ⋅ 2.0046 = 0.2082 − 0.2005 = 0.0077
2 Update 𝑏 :
𝑏 = 0.1229 − 0.1 ⋅ 0.6682 = 0.1229 − 0.0668 = 0.0561
Summary of Updated Parameters
After one epoch, the updated weights and bias are:
• 𝑤 = 0.0077
• 𝑏 = 0.0561

You might also like