Gradient DescentSummartyL5

Uploaded by

omarobeidd03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views7 pages

Gradient DescentSummartyL5

Uploaded by

omarobeidd03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Gradient Descent:

1) Gradient descent is an optimization algorithm used in deep learning to minimize a function,

usually a cost or loss function.
2) The goal is to find the optimal values of the model's parameters (like weights in a neural
network) that minimize the error between the model's predictions and the actual values.
3) Here's a simple breakdown of how gradient descent works:
a. Start with Initial Parameters: It begins with random values for the parameters of the model.
b. Calculate the Loss: For each set of parameters, the algorithm calculates the loss (or error),
which tells how far off the model's predictions are from the actual values.
c. Compute the Gradient: It calculates the gradient, which is the direction and rate of the
steepest increase of the loss function. The gradient is like a slope; it shows how the loss
changes with each parameter.
d. Update Parameters: The parameters are then updated in the opposite direction of the
gradient (downhill), because we want to minimize the loss. The size of each update is
controlled by a learning rate, a small number that dictates how big a step is taken toward
the minimum.
e. Repeat: The process is repeated many times until the loss is minimized or until further
updates are very small.
4) Learning Rate: Determines how big a step you take; too large can miss the minimum, too small
can take too long.
5) Convergence: The process stops when updates to the parameters become negligible or when a
set number of iterations is reached.
6) Gradient descent is like taking small steps down a slope to find the lowest point, gradually
moving closer to the minimum value of the function.
7) Grid Search:
8) Remark: Grid Search works only in the number of paramaters is small.
9) Directional Grid Search Concept:
 Instead of testing all possible combinations as in regular grid search, directional grid search
incrementally adjusts the parameter value in a specific direction (up or down) until it converges
to a minimum.
 The process continues until the change in the value of the function 𝑦 becomes very small (less
than 0.0001). At this point, the function is considered to have reached the minimum.
 If the function doesn’t converge within a set number of iterations (e.g., 2000 iterations), the
search is considered a failure.
 Hyperparameter Adjustment:
A key hyperparameter here is the step size, which determines how much the value of
x changes in each iteration.
Too Small Step Size: If the step size is too small, the search takes a long time to converge,
making the process computationally expensive.
Too Large Step Size: If the step size is too large, the function might oscillate back and forth
around the minimum without ever settling, which makes convergence slow or impossible.
 Choosing the Step Size:
A moderate change value is preferred to ensure convergence without overshooting. For
example, increasing x by 0.01 in each iteration is considered reasonable.
10) Difference from Gradient Descent:
Directional Grid Search: Incrementally adjusts parameter values in one direction at a time,
similar to grid search but with a directional approach rather than testing all combinations.
Gradient Descent: Uses the gradient (slope) of the function to guide the direction and
magnitude of parameter updates, typically leading to faster convergence towards the minimum.
11)
 x0 is the initial value of x, set to 1.2345.
 y0 is the value of the objective function y= (x - 10)^2 evaluated at the initial x0.
 step = 0.01 defines the step size or increment used for adjusting x during the search.
 xs and ys are lists that will keep track of the values of x and y over iterations.
 xs.append(x0) and ys.append(y0) add the initial values to these list.

 The function dydx(x) calculates the derivative of the objective function y=(x−10)**2.This gives the
gradient or slope at any point x.

 The main loop runs up to 2000 iterations (from 1 to 2001).

 The current value of x is initially set to x0.
Then checks the sign of the gradient (slope) using the dydx(x) function:

 If the gradient is positive (dydx(x) > 0), this means the slope is going upward, so we
decrease x (move left) by step.
 If the gradient is negative or zero, the slope is going downward or flat, so we increase x
(move right) by step.
  After updating x, the algorithm recalculates the value of the objective function
y=(x−10)**2.
  The new values of x and y are added to the xs and ys lists, respectively.
 Now it checks if the change in y between the last two iterations is smaller than
0.0001.
 If this condition is satisfied, the loop breaks, and the algorithm prints:
o The number of steps it took to find the minimum (i).
o The value of y at the minimum (ys[-1]).
o The optimal value of x (xs[-1]).
o If the loop reaches the maximum of 2000 iterations without finding a small enough
change in y, it prints a failure message.
o These variables track the best (minimum) values of y, the iteration at which this
minimum occurred (argmin_y), and the corresponding value of x (best_x).

Summary:
 The code is an implementation of directional grid search combined with gradient
descent.
 It minimizes the function y=(x−10)2**2 by updating x based on the gradient:
o If the gradient is positive, it moves x left (decreases x).
o If the gradient is negative, it moves x right (increases x).
 The algorithm stops when the change in y between two iterations is small enough (less
than 0.0001) or after 2000 iterations if it fails to converge.

12)

13)
14) In simple mathematical functions, calculating the gradient (or slope) is straightforward using
basic calculus.
15) However, in more complex machine learning models, especially deep neural networks, manually
calculating gradients becomes very difficult.
16) TensorFlow is a powerful library that automates the calculation of gradients, making it easier to
implement gradient-based optimization techniques in complex models.
17) Code:
import tensorflow as tf
from IPython.display import Markdown as md
# Define the variable x, initially set to 2
tfx = tf.Variable(2, dtype='float32')
# Use GradientTape to record operations for automatic differentiation
with tf.GradientTape() as tape:
ty = (tfx - 10)**2 # Define the function y = (x - 10)^2
# Compute the gradient of y with respect to x
dydx = tape.gradient(ty, tfx).numpy()
# Display the result
md(f"the gradient of the function $y=(x-10)^2$ at $x=2$ is {dydx}")

This took 30 steps while the previous one (directional grid search) took 877.

Simplifications of Context-Free Grammars: Costas Buch - RPI 1
No ratings yet
Simplifications of Context-Free Grammars: Costas Buch - RPI 1
51 pages
Lowell W. Beineke & Robin J. Wilson & Peter J. Cameron - Topics in Algebraic Graph Theory
100% (2)
Lowell W. Beineke & Robin J. Wilson & Peter J. Cameron - Topics in Algebraic Graph Theory
293 pages
Machine Learning - Exploring The Model - Resp
No ratings yet
Machine Learning - Exploring The Model - Resp
18 pages
Principles of Programming Languages: M.Archana
No ratings yet
Principles of Programming Languages: M.Archana
94 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Advanced Operating System (HW3) Department: IITA Student ID: BIA110004 Name: 哈瓦尼
No ratings yet
Advanced Operating System (HW3) Department: IITA Student ID: BIA110004 Name: 哈瓦尼
5 pages
Analysis and Design Procedure
No ratings yet
Analysis and Design Procedure
31 pages
Disjunktni Skupovi
No ratings yet
Disjunktni Skupovi
44 pages
VLSI Architectures and Implementations
No ratings yet
VLSI Architectures and Implementations
56 pages
Advanced Algorithm Analysis: Muhammad Nadeem July 15, 2017
No ratings yet
Advanced Algorithm Analysis: Muhammad Nadeem July 15, 2017
27 pages
Network Flow Algorithms
No ratings yet
Network Flow Algorithms
33 pages
CMPT-101 Sample Final Exam Answer Key 3 Hours: Out of Mark
No ratings yet
CMPT-101 Sample Final Exam Answer Key 3 Hours: Out of Mark
13 pages
1
No ratings yet
1
9 pages
Design of Hamming Code Using Verilog
No ratings yet
Design of Hamming Code Using Verilog
18 pages
Idm Assignment 3 22735
No ratings yet
Idm Assignment 3 22735
8 pages
4-Bit Truth Table - Google Search
No ratings yet
4-Bit Truth Table - Google Search
5 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Unit 4a Decision Tree
No ratings yet
Unit 4a Decision Tree
90 pages
Itec 102 Reviewer
No ratings yet
Itec 102 Reviewer
9 pages
ESS 1001 E Tutorial #5
No ratings yet
ESS 1001 E Tutorial #5
3 pages
SOL - DU - MBAFT-6202 Decision Modeling and Optimization With Distributed Network
No ratings yet
SOL - DU - MBAFT-6202 Decision Modeling and Optimization With Distributed Network
45 pages
Parallel Models of Computation
No ratings yet
Parallel Models of Computation
3 pages
CH 1234 Summaries SE
No ratings yet
CH 1234 Summaries SE
13 pages
Performance Check No. 2 Answer Key For Reference: X X X U
No ratings yet
Performance Check No. 2 Answer Key For Reference: X X X U
2 pages
Chapter-4 (Problem Types)
No ratings yet
Chapter-4 (Problem Types)
2 pages
Reporte 1
No ratings yet
Reporte 1
8 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Rubrics
No ratings yet
Rubrics
1 page
Huffman Project Report
No ratings yet
Huffman Project Report
5 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
3 Types of Gradient Descent Algorithms For Small & Large Datasets
No ratings yet
3 Types of Gradient Descent Algorithms For Small & Large Datasets
9 pages
CSD411 Week7 Regression
No ratings yet
CSD411 Week7 Regression
75 pages
Gradient Descent
No ratings yet
Gradient Descent
108 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Lect 7 - Gradient Descent
No ratings yet
Lect 7 - Gradient Descent
13 pages
Unit 3 Process Deadlocks
No ratings yet
Unit 3 Process Deadlocks
30 pages
Assignmenttttttttt
No ratings yet
Assignmenttttttttt
1 page
Amazon - LeetCode
No ratings yet
Amazon - LeetCode
30 pages
CSC447 Multidimensional Grids and Data
No ratings yet
CSC447 Multidimensional Grids and Data
65 pages
3 Gradient Descent
No ratings yet
3 Gradient Descent
8 pages
An Introduction To Gradient Descent and Linear Regression
No ratings yet
An Introduction To Gradient Descent and Linear Regression
8 pages
Lab 1 Parallel
No ratings yet
Lab 1 Parallel
4 pages
GD Types
No ratings yet
GD Types
98 pages
Mathematical Foundations of Computer Science Sept 2021
No ratings yet
Mathematical Foundations of Computer Science Sept 2021
1 page
Semaphores and Mutexes
No ratings yet
Semaphores and Mutexes
36 pages
CSC430 L2 Sum
No ratings yet
CSC430 L2 Sum
3 pages
MFD S Assignment 2
No ratings yet
MFD S Assignment 2
18 pages
Gradient Descend
No ratings yet
Gradient Descend
64 pages
5 Optimizers
No ratings yet
5 Optimizers
10 pages
Gradient Descent and SGD
No ratings yet
Gradient Descent and SGD
8 pages
Deep Learning L4
No ratings yet
Deep Learning L4
19 pages
Deep Learning L5
No ratings yet
Deep Learning L5
17 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Lt.j-A) : TLR-D
No ratings yet
Lt.j-A) : TLR-D
4 pages
ML Notes
No ratings yet
ML Notes
14 pages
Google Colab Solution Activity
No ratings yet
Google Colab Solution Activity
5 pages
LAB2
No ratings yet
LAB2
4 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
Btech Cs Aiml 4 Sem Design and Analysis of Algorithm Aug 2022
No ratings yet
Btech Cs Aiml 4 Sem Design and Analysis of Algorithm Aug 2022
2 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Yash 21bsds12
No ratings yet
Yash 21bsds12
3 pages
Image Blurring Report
No ratings yet
Image Blurring Report
2 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Gradient Descent A Fundamental Optimization Algorithm
No ratings yet
Gradient Descent A Fundamental Optimization Algorithm
30 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Sheet 3 Sol 3
No ratings yet
Sheet 3 Sol 3
3 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
ML Module 5 Full Notes
No ratings yet
ML Module 5 Full Notes
23 pages
Class 6 Computer Science
No ratings yet
Class 6 Computer Science
2 pages
Week 10 Notes MLF
No ratings yet
Week 10 Notes MLF
20 pages
GD Algo
No ratings yet
GD Algo
18 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
Lec16 MTH305
No ratings yet
Lec16 MTH305
72 pages
Gradient Descent Final
No ratings yet
Gradient Descent Final
27 pages
Gradient Descent From Scratch Complete Intuition
No ratings yet
Gradient Descent From Scratch Complete Intuition
8 pages
(ML&PR 2025) Lec2 Regression II
No ratings yet
(ML&PR 2025) Lec2 Regression II
41 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
14-RMSProp and Adam Optimization-12!08!2024
No ratings yet
14-RMSProp and Adam Optimization-12!08!2024
2 pages
Lec17 MTH305
No ratings yet
Lec17 MTH305
34 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
LInear
No ratings yet
LInear
14 pages
06 23ECE216 GradientDescent v2
No ratings yet
06 23ECE216 GradientDescent v2
73 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
CSC 438 Blockchain Systems - Programming Project
No ratings yet
CSC 438 Blockchain Systems - Programming Project
4 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
Mathematical Analysis of Descent Algorithms in Artificial Intelligence Convergence, Loss Landscapes, and Structural Optimization
No ratings yet
Mathematical Analysis of Descent Algorithms in Artificial Intelligence Convergence, Loss Landscapes, and Structural Optimization
8 pages
CH 4
No ratings yet
CH 4
28 pages
Gradient Descent
No ratings yet
Gradient Descent
52 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
Chap6 (Regression)
No ratings yet
Chap6 (Regression)
74 pages
Gradient Descent Algorithm.Y...
No ratings yet
Gradient Descent Algorithm.Y...
10 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet

Gradient DescentSummartyL5

Uploaded by

Gradient DescentSummartyL5

Uploaded by

Gradient Descent:

1) Gradient descent is an optimization algorithm used in deep learning to minimize a function,

 The main loop runs up to 2000 iterations (from 1 to 2001).

You might also like