0% found this document useful (0 votes)
2 views

Module 4 Lab 2

This guide provides a comprehensive introduction to Gradient Descent, detailing its function as an optimization algorithm to minimize loss functions. It covers essential concepts such as the importance of partial derivatives, the process of updating parameters, and the difference between full batch and mini-batch gradient descent. Key takeaways emphasize the role of the learning rate, loss function, and the significance of visualizing training loss for effective model evaluation.

Uploaded by

katrao39798
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 4 Lab 2

This guide provides a comprehensive introduction to Gradient Descent, detailing its function as an optimization algorithm to minimize loss functions. It covers essential concepts such as the importance of partial derivatives, the process of updating parameters, and the difference between full batch and mini-batch gradient descent. Key takeaways emphasize the role of the learning rate, loss function, and the significance of visualizing training loss for effective model evaluation.

Uploaded by

katrao39798
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

<q>Fully Explained Guide: Module 4 Lab 2 —

Introduction to Gradient Descent


(Updated and Expanded with All Your Queries Answered)</q> Update this with all the queries
that I ve asked on this module 4 lab 2
Certainly! Here’s a fully updated and beginner-friendly guide to Module 4 Lab 2 —
Introduction to Gradient Descent, incorporating all your queries and explanations from our
conversation.

📖 Fully Explained Guide: Module 4 Lab 2 —


Introduction to Gradient Descent
(With All Your Questions Answered and Examples Simplified)

🔍 Section 1 — What is Gradient Descent?

Gradient Descent is an optimization algorithm used to find the best parameters (like weights in a
model) that minimize a loss function (a measure of how wrong your model is).
How it works:
1. Start with random guesses for your parameters.
2. Calculate how “bad” your guess is using a loss function (like Mean Squared Error).
3. Compute the gradient (slope) of the loss with respect to each parameter—this tells you
which way to move to reduce the loss.
4. Update each parameter a little bit in the direction that reduces the loss.
5. Repeat until the loss stops getting smaller.
Update formula:

= parameter (like , , or )
= learning rate (step size)
gradient = partial derivative of the loss with respect to
🔍 Section 2 — Importing Required Libraries

import numpy as np
import matplotlib.pyplot as plt
import random

random.seed(42)
np.random.seed(42)

numpy: For calculations.


matplotlib.pyplot: For plotting graphs.
random: For adding randomness.
Setting seeds ensures you get the same results every time.

🔍 Section 3 — Generating Data for a Quadratic Function

We want to fit data that follows (so ).


We generate 100 random values between -10 and 10.
For each , we compute .
To make it realistic, we add a little random “jitter” (noise) to each .
Why add noise?
Real-world data is never perfect; adding jitter simulates measurement errors.

🔍 Section 4 — What Are We Trying to Learn?

We want to find the best coefficients , , and for the quadratic equation
that fit our noisy data.

🔍 Section 5 — How Do We Measure “Best”? (The Loss Function)


We use Mean Squared Error (MSE):

: Actual value
: Predicted value from our current guess
Lower MSE = better fit.
🔍 Section 6 — How Does Gradient Descent Improve Our Guess?

A. Start with a Guess


Set , , (or any starting values).

B. Make Predictions
For each , compute .

C. Calculate the Loss


Compute the MSE for all points.

D. Use Partial Derivatives to Guide Updates


For each parameter ( , , ), calculate the partial derivative of the loss with respect to that
parameter.
This tells you: If I nudge this parameter up or down, does the loss get better or worse?
Update Rule for Each Parameter:

Example:
If and , then

E. Repeat
Keep repeating the steps (predict, compute loss, compute gradients, update) for many
iterations (epochs).
The loss should get smaller each time.

🔍 Section 7 — Why Do We Use Partial Derivatives?

Partial derivatives tell us how much the loss will change if we change just one parameter.
They point in the direction of steepest increase; moving in the opposite direction reduces
the loss.
This is how gradient descent “knows” which way to step for each parameter.

🔍 Section 8 — Example: Updating Parameters


Suppose after calculating, you get:
Learning rate
Update:

🔍 Section 9 — Full Batch vs. Mini-Batch Gradient Descent

Full Batch: Uses all data to compute the gradient in each update. No need to shuffle data;
order doesn’t matter.
Mini-Batch: Uses small, randomly selected subsets (mini-batches) to compute the gradient
and update parameters.
Shuffling is important to avoid biased batches (e.g., all one class in a batch).

🔍 Section 10 — Visualizing Training Loss


Plot the MSE after each update (epoch).
You should see the loss decrease and flatten out as the model learns.

🔍 Section 11 — Final Model Evaluation

After training, print out the learned , , .


They should be close to the real values (here, , , ).

🔍 Section 12 — Key Takeaways

Gradient Descent uses partial derivatives to update each parameter in the direction that
reduces loss.
Learning rate controls the size of each step.
Loss function (like MSE) measures how well the model fits the data.
Batch size (full vs. mini) affects how updates are computed and whether shuffling is
needed.
Visualization of loss helps you see if training is working.

Simple Analogy
Gradient descent is like finding the bottom of a valley (minimum loss) by feeling the slope (partial
derivatives) and always stepping downhill (negative gradient), adjusting your direction for each
parameter ( , , ) separately.
If you want more details, code examples, or further clarification on any step, just ask!

You might also like