0% found this document useful (0 votes)
4 views7 pages

Gradient DescentSummartyL5

Uploaded by

omarobeidd03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views7 pages

Gradient DescentSummartyL5

Uploaded by

omarobeidd03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Gradient Descent:

1) Gradient descent is an optimization algorithm used in deep learning to minimize a function,


usually a cost or loss function.
2) The goal is to find the optimal values of the model's parameters (like weights in a neural
network) that minimize the error between the model's predictions and the actual values.
3) Here's a simple breakdown of how gradient descent works:
a. Start with Initial Parameters: It begins with random values for the parameters of the model.
b. Calculate the Loss: For each set of parameters, the algorithm calculates the loss (or error),
which tells how far off the model's predictions are from the actual values.
c. Compute the Gradient: It calculates the gradient, which is the direction and rate of the
steepest increase of the loss function. The gradient is like a slope; it shows how the loss
changes with each parameter.
d. Update Parameters: The parameters are then updated in the opposite direction of the
gradient (downhill), because we want to minimize the loss. The size of each update is
controlled by a learning rate, a small number that dictates how big a step is taken toward
the minimum.
e. Repeat: The process is repeated many times until the loss is minimized or until further
updates are very small.
4) Learning Rate: Determines how big a step you take; too large can miss the minimum, too small
can take too long.
5) Convergence: The process stops when updates to the parameters become negligible or when a
set number of iterations is reached.
6) Gradient descent is like taking small steps down a slope to find the lowest point, gradually
moving closer to the minimum value of the function.
7) Grid Search:
8) Remark: Grid Search works only in the number of paramaters is small.
9) Directional Grid Search Concept:
 Instead of testing all possible combinations as in regular grid search, directional grid search
incrementally adjusts the parameter value in a specific direction (up or down) until it converges
to a minimum.
 The process continues until the change in the value of the function 𝑦 becomes very small (less
than 0.0001). At this point, the function is considered to have reached the minimum.
 If the function doesn’t converge within a set number of iterations (e.g., 2000 iterations), the
search is considered a failure.
 Hyperparameter Adjustment:
A key hyperparameter here is the step size, which determines how much the value of
x changes in each iteration.
Too Small Step Size: If the step size is too small, the search takes a long time to converge,
making the process computationally expensive.
Too Large Step Size: If the step size is too large, the function might oscillate back and forth
around the minimum without ever settling, which makes convergence slow or impossible.
 Choosing the Step Size:
A moderate change value is preferred to ensure convergence without overshooting. For
example, increasing x by 0.01 in each iteration is considered reasonable.
10) Difference from Gradient Descent:
Directional Grid Search: Incrementally adjusts parameter values in one direction at a time,
similar to grid search but with a directional approach rather than testing all combinations.
Gradient Descent: Uses the gradient (slope) of the function to guide the direction and
magnitude of parameter updates, typically leading to faster convergence towards the minimum.
11)
 x0 is the initial value of x, set to 1.2345.
 y0 is the value of the objective function y= (x - 10)^2 evaluated at the initial x0.
 step = 0.01 defines the step size or increment used for adjusting x during the search.
 xs and ys are lists that will keep track of the values of x and y over iterations.
 xs.append(x0) and ys.append(y0) add the initial values to these list.

 The function dydx(x) calculates the derivative of the objective function y=(x−10)**2.This gives the
gradient or slope at any point x.

 The main loop runs up to 2000 iterations (from 1 to 2001).


 The current value of x is initially set to x0.
Then checks the sign of the gradient (slope) using the dydx(x) function:

 If the gradient is positive (dydx(x) > 0), this means the slope is going upward, so we
decrease x (move left) by step.
 If the gradient is negative or zero, the slope is going downward or flat, so we increase x
(move right) by step.
  After updating x, the algorithm recalculates the value of the objective function
y=(x−10)**2.
  The new values of x and y are added to the xs and ys lists, respectively.
 Now it checks if the change in y between the last two iterations is smaller than
0.0001.
 If this condition is satisfied, the loop breaks, and the algorithm prints:
o The number of steps it took to find the minimum (i).
o The value of y at the minimum (ys[-1]).
o The optimal value of x (xs[-1]).
o If the loop reaches the maximum of 2000 iterations without finding a small enough
change in y, it prints a failure message.
o These variables track the best (minimum) values of y, the iteration at which this
minimum occurred (argmin_y), and the corresponding value of x (best_x).

Summary:
 The code is an implementation of directional grid search combined with gradient
descent.
 It minimizes the function y=(x−10)2**2 by updating x based on the gradient:
o If the gradient is positive, it moves x left (decreases x).
o If the gradient is negative, it moves x right (increases x).
 The algorithm stops when the change in y between two iterations is small enough (less
than 0.0001) or after 2000 iterations if it fails to converge.

12)

13)
14) In simple mathematical functions, calculating the gradient (or slope) is straightforward using
basic calculus.
15) However, in more complex machine learning models, especially deep neural networks, manually
calculating gradients becomes very difficult.
16) TensorFlow is a powerful library that automates the calculation of gradients, making it easier to
implement gradient-based optimization techniques in complex models.
17) Code:
import tensorflow as tf
from IPython.display import Markdown as md
# Define the variable x, initially set to 2
tfx = tf.Variable(2, dtype='float32')
# Use GradientTape to record operations for automatic differentiation
with tf.GradientTape() as tape:
ty = (tfx - 10)**2 # Define the function y = (x - 10)^2
# Compute the gradient of y with respect to x
dydx = tape.gradient(ty, tfx).numpy()
# Display the result
md(f"the gradient of the function $y=(x-10)^2$ at $x=2$ is {dydx}")

This took 30 steps while the previous one (directional grid search) took 877.

You might also like