0% found this document useful (0 votes)

40 views36 pages

Module 4

The document discusses optimization techniques including gradient descent variants like momentum, Adagrad, RMSprop, and Adam. It provides animations showing how different gradient descent methods work and defines key optimization terms. Examples are given to illustrate optimization problems in scenarios like packing a lunchbox efficiently, minimizing pizza delivery costs, and maximizing crop yield for a farmer.

Uploaded by

sourish.js2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views36 pages

Module 4

Uploaded by

sourish.js2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

BCSE352E- Essentials of

Data Analytics

1
Topics in Module-4-Optimization
• Gradient descent

• Variants of gradient descent

• Momentum

• Adagrad

• RMSprop

• Adam

• AMSGrad

2
Topics in Module-4-Optimization
Animation of 5 gradient descent
methods on a surface: gradient
descent (cyan), momentum (magenta),
AdaGrad (white), RMSProp (green),
Adam (blue). Left well is the global
minimum; right well is a local
minimum.

Source:
https://towardsdatascience.com/a-
visual-explanation-of-gradient-
descent-methods-momentum-
adagrad-rmsprop-adam-
f898b102325c

3
Optimization?

• Optimization is the process of maximizing or minimizing a real

function by systematically choosing input values from an allowed set
of values and computing the value of the function.

• It refers to usage of specific methods to determine the best solution

from all feasible solutions, say for example, finding the best functional
representation and finding the best hyperplane to classify data.

4
Optimization?
• Three components of an optimization problem: objective function (minimization
or maximization), decision variables and constraints.

• Based on the type of objective function, constraints and decision variables,

several types of optimization problems exists.

• An optimization can be linear or non-linear, convex or non-convex, iterative or

non-iterative, etc.

• Optimization is considered as one among the three pillars of data science. Linear
algebra and statistics are the other two pillars.
5
Terminologies in Optimization Module-4: Introduction to Optimization

• Maxima is the largest and Minima

is the smallest value of a function
within a given range.
• Global Maxima and Minima: It is
the maximum value and minimum
value respectively on the entire
domain of the function
• Local Maxima and Minima: It is
the maximum value and minimum
value respectively of the function
within a given range.
• There can be only one global
minima and maxima but there can
be more than one local minima and
maxima.
6
Optimization – Scenario 1
• Optimization is the process of making something as effective or
functional as possible.
• In simple terms, it means finding the best way to do something or
achieve a particular goal.
• Let's use a real-life example - packing a lunchbox.
• Imagine you have a lunchbox, and you want to pack it with as much
food as possible while still making sure it's balanced and nutritious.
This process involves optimizing the use of space in the lunchbox to
maximize the amount of food you can take with you.

7
Working of Optimization: packing a
lunchbox
• Goal: The goal is to pack the lunchbox with a variety of tasty and
nutritious items so that you have a satisfying meal at lunchtime.
• Constraints: There are constraints, or limitations, to consider. For
example, the lunchbox has a fixed size, and you may want to include a
variety of items like a sandwich, fruits, vegetables, and a drink.
• Optimization: Optimization involves figuring out the best way to
arrange and pack these items to make the most of the limited space.
You might consider the size and shape of each item, how they fit
together, and how to use the available space efficiently.

8
Working of Optimization: packing a
lunchbox
• Trade-offs: Sometimes, you might have to make trade-offs. For
instance, if you want to include a larger sandwich, you may need to
sacrifice space for other items.
• Optimization involves finding the right balance based on your
priorities.
• Outcome: The optimized lunchbox is the one that allows you to fit the
most satisfying and nutritious combination of items within the given
constraints.

9
• In this example, the lunchbox represents a problem or a task, and
optimization is the process of arranging and selecting items to achieve
the best outcome within the given limits.
• Optimization concept applies to many real-world scenarios, from
organizing your room to planning a schedule or even solving more
complex problems in fields like mathematics, engineering, and
computer science.

10
Optimization – Scenario 2
• Requirement: Minimizing Cost
• Problem: A pizza delivery business want to minimize the cost of
delivering pizzas to different locations in a city.
• Objective: Minimize the total cost of delivering pizzas.
• Constraints:
• Each delivery has a fixed cost associated with it.
• There is a maximum distance a delivery person can travel in a given time.
• Each delivery has a time window during which it must be completed

11
• Optimization Steps:
• Identify Costs: Understand the cost associated with each delivery, including
travel time, fuel, and other expenses.
• Define Constraints: Consider the limitations, such as the maximum distance and
time window for each delivery.
• Optimize Routes: Use optimization algorithms to find the most efficient routes
for the delivery persons, minimizing the total cost while adhering to constraints.
• Outcome: The optimized solution would provide the most cost-effective
way to deliver pizzas, ensuring that deliveries are made within the
specified time windows and without exceeding the maximum distance.

12
Optimization – Scenario 3
• Requirement: Maximizing Benefit
• Problem: A farmer with a limited amount of land want to maximize
the crop yield to get the highest profit.
• Objective: Maximize the total crop yield.
• Constraints:
• Limited land area available for cultivation.
• Each crop requires specific conditions (e.g., sunlight, water) and has a growth
period.

13
• Optimization Steps:
• Understand Crop Characteristics: Know the growth requirements and yield
potential of different crops.
• Consider Land Constraints: Take into account the limited land area available
for cultivation.
• Optimize Crop Selection: Use optimization techniques to choose the
combination of crops and their arrangement that maximizes the total yield
within the available land and time constraints.
• Outcome: The optimized solution would provide the farmer with the
most profitable combination of crops to plant, considering the
available land and the specific requirements of each crop
14
Module-4: Introduction to Optimization
What is Optimization?

15
Module-4: Introduction to Optimization
What is Optimization?

• Choosing the best element from some set of available alternatives and solving problems in which
one seeks to minimize or maximize a real function
• Optimization is the process where we train the model iteratively that results in a maximum and
minimum function evaluation.
16
Module-4: Introduction to Optimization
What is Optimization?

17
Module-4: Introduction to Optimization
Effect of learning rate on Optimization
• Learning rate (λ) is one such hyper-
parameter that defines
the adjustment in the weights of our
network with respect to the loss
gradient descent.
• It determines how fast or slow we
will move towards the optimal
weights.

• If the learning rate is very large we will skip the optimal solution.
• If it is too small we will need too many iterations to converge to the best values. So
using a good learning rate is crucial.
• A learning rate of 0.01 and 0.011 are unlikely to yield vastly different results.
18
Module-4: Introduction to Optimization
2 2
Basic Optimization Algorithm and Example F x  = x1 + 2 x1 x 2 + 2x 2 + x1
xk + 1 = xk +  k p k 0.5  = 0.1
x0 =
or 0.5
 xk =  xk + 1 – x k  =  kp k 
F x  g0 =  F x  = 3
 x1 2x 1 + 2x2 + 1 x = x0 3
F  x  = =
 2x 1 + 4x 2
xk +1 F x 
 kp k
 x2
xk
x 1 = x 0 – g 0 = 0.5 – 0.1 3 = 0.2
0.5 3 0.2
pk - Search Direction

ak - Learning Rate
x2 = x1 – g1 = 0.2 – 0.1 1.8 = 0.02
0.2 1.2 0.08
19
Module-4: Introduction to Optimization

Stable Learning Rates (Quadratic)

1 T T
F  x  = -- x Ax + d x + c
2

F  x  = Ax + d

x k + 1 = xk –  gk = x k –   Ax k + d  xk + 1 =  I –  A x k –  d

Stability is determined
by the eigenvalues of
this matrix.

 I –  A  zi = z i –  Az i = z i –  iz i =  1 –  i z i

(li - eigenvalue of A) Eigenvalues

of [I - aA].

Stability Requirement:
2 2
 1 –  i  1   ----   ------------
i max
20
Module-4: Introduction to Optimization

Example2 2   0.851     0.526  

A= 
 1  = 0.764  z
 1 =  
 2 = 5.24 z
 2 = 
24   – 0.526     0.851  

2 2
  ------------ = ---------- = 0.38
max 5.24

 = 0.37  = 0.39
2 2

1 1

0 0

-1 -1

-2 -2
-2 -1 0 1 2 -2 -1 0 1 2 21
Gradient descent optimization Module-4 Topic-1: Gradient descent

Gradient descent equation is given by

θ is the parameter we wish to update, dJ/dθ is the partial derivative which tells us the rate of
change of error on the cost function with respect to the parameter θ and α here is the Learning
Rate. J here represents the cost function and there are multiple ways to calculate this cost. Based
on the way we are calculating this cost function there are different variants of Gradient Descent.

22
Gradient descent optimization Module-4 Topic-1: Gradient descent

• Gradient Descent Algorithm iteratively

calculates the next point using gradient at the
current position, scales it (by a learning rate)
and subtracts obtained value from the current
position (makes a step).
• It subtracts the value because we want to
minimise the function (to maximise it would be
adding). This process can be written as:

η which scales the gradient and thus controls

the step size. In machine learning, it is
called learning rate and have a strong
influence on performance. Source: https://towardsdatascience.com/a-visual-
explanation-of-gradient-descent-methods-momentum-
23
adagrad-rmsprop-adam-f898b102325c
Module-4 Topic-1: Gradient descent

Gradient descent optimization algorithms-

Newton’s Method
T 1 T
F  xk + 1  = F  xk +  xk   F  xk  + g k  x k + --  xk A k x k
2
Take the gradient of this second-order approximation
and set it equal to zero to find the stationary point:

gk + Ak  xk = 0
–1
 x k = – Ak g k

xk + 1 = xk – A–k 1 gk

24
Module-4 Topic-1: Gradient descent

Example
2
2 2
F x  = x1 + 2 x1 x 2 + 2x 2 + x1

 x0 = 0.5 1

F x 
 x1 2x 1 + 2x2 + 1 0.5
F  x  = =
 2x 1 + 4x 2
F x  0

 x2

g0 =  F x  = 3 -1

x = x0 3
A= 22 -2
24 -2 -1 0 1 2

–1
0.5 2 2 3 0.5 1 – 0.5 3 0.5 1.5 –1
x1 = – = – = – =
0.5 2 4 3 0.5 – 0.5 0.5 3 0.5 0 0.5
25
Module-4- Topic-2 Variants of gradient descent

Variants of gradient
1. Batch gradient descent:
descent
• Vanilla gradient descent, is the simplest variant of gradient descent.
• In batch gradient descent, the entire training dataset is used to compute the gradients of the cost
function with respect to the model parameters in each iteration.
• This can be computationally expensive for large datasets, but it guarantees convergence to a local
minimum of the cost function.

26
Module-4- Topic-2 Variants of gradient descent

Variants of gradient descent

2. Stochastic Gradient Descent (SGD):
• Stochastic gradient descent is a variant of gradient descent that updates the model parameters for
each training example in the dataset.
• Unlike batch gradient descent, which uses the entire dataset to compute the gradients, SGD updates
the parameters based on a randomly selected training example.
• This can lead to faster convergence because the updates are more frequent and noisy, but it can also
result in more oscillations in the cost function due to the randomness of the updates.

27
Module-4- Topic-2 Variants of gradient descent
Variants of gradient
3. Mini-batch Gradient Descent
descent
• Mini-batch gradient descent is a compromise between batch gradient
descent and stochastic gradient descent.
• In mini-batch gradient descent, the gradients are computed on a small
random subset of the training dataset, typically between 10 and 1000
examples, called a mini-batch.

• This reduces the computational cost of the algorithm compared to batch gradient descent, while also
reducing the variance of the updates compared to SGD.
• Mini-batch gradient descent is widely used in deep learning because it strikes a good balance between
convergence speed and stability. 28
Module-4- Topic-2 Variants of gradient descent
Variants of gradient descent
4. Nesterov Accelerated Gradient (NAG)
• Nesterov accelerated gradient is an extension of momentum gradient descent that takes into account
the future gradient values when computing the momentum term.
• This helps to reduce overshooting and can lead to faster convergence than momentum gradient
descent.

29
Module-4- Topic-3 Momentum
Variants of gradient
5. Momentum Gradient
descent
• Momentum gradient descent is a variant of gradient descent
that adds a momentum term to the update rule.
• The momentum term accumulates the gradient values over
time and dampens the oscillations in the cost function, leading
to faster convergence. This is particularly useful in cases where
the cost function has a lot of noise or curvature, which can
cause traditional gradient descent to get stuck in local minima.

30
Module-4- Topic-4 Adagrad
Variants
6. Adagrad
of gradient descent
• Adagrad is a variant of gradient descent that adapts the
learning rate for each parameter based on its historical
gradient values.
• Parameters with large gradients have their learning rates
reduced, while parameters with small gradients have
their learning rates increased. This helps to normalize
the updates and can be useful in cases where the cost
function has a lot of curvature or different scales of
gradients.

31
Module-4- Topic-5 RMSProp
Variants of gradient descent
7. RMSProp
• RMSProp is a variant of gradient descent that
also adapts the learning rate for each parameter,
but instead of using the historical gradient values, it
uses a moving average of the squared gradient
values. This helps to reduce the learning rate for
parameters that have large squared gradient values,
which can cause the algorithm to oscillate or diverge.

32
Module-4- Topic-6 Adam
Variants of gradient descent
8. Adam
• Adam, Adaptive Moment Estimation, is a variant of gradient descent that combines the ideas of
Adagrad and RMSProp.
• It adapts the learning rate for each parameter based on the historical gradient values and also uses a
moving average of the gradient values to compute the momentum term.
• Adam is one of the most widely used optimization algorithms in deep learning because it is efficient,
stable, and robust to different types of cost functions and datasets.

33
Module-4- Topic-7 AMSGrad
Variants of gradient descent
9. AMSGrad
• AMSGrad is an extension to the Adam
version of gradient descent that attempts
to improve the convergence properties of
the algorithm, avoiding large abrupt
changes in the learning rate for each input
variable.

34
Module-4- Variants of gradient descent

Variants of Gradient descent optimization

algorithms

35
Summary Module-4- Summary

• Introduction to Optimization
• Gradient Descent: first-order,
iterative-based optimization
algorithm
• Variants of Gradient Descent: batch
gradient descent, mini-batch gradient
descent and stochastic gradient
descent
• Momentum Optimizer: accelerates
the stochastic gradient descent in the
relevant direction - NAG uses the
momentum term for anticipatory • Adadelta: sum of gradients recursively defined as the
update decaying
• Adagrad: adaptively scales learning • average of past gradients
rate for different dimension • RMSProp: same first update of Adadelta
• Adam: combination of RMSProp and momentum
• AMSGrad: considers the maximum of past squared gradients
36

1-Introduction To Optimization
No ratings yet
1-Introduction To Optimization
22 pages
Introduction to Optimization Notes
No ratings yet
Introduction to Optimization Notes
43 pages
Mathematical Optimisation Overview
No ratings yet
Mathematical Optimisation Overview
23 pages
123
No ratings yet
123
21 pages
MCA Computer Based Optimization O
No ratings yet
MCA Computer Based Optimization O
148 pages
Main
No ratings yet
Main
57 pages
Overview of Optimization Methods
No ratings yet
Overview of Optimization Methods
9 pages
Understanding Optimization Techniques
No ratings yet
Understanding Optimization Techniques
13 pages
Module1 NIA Detail Notes
No ratings yet
Module1 NIA Detail Notes
80 pages
Optimization Techniques in AI and ML - RB
No ratings yet
Optimization Techniques in AI and ML - RB
42 pages
Basic Concepts
No ratings yet
Basic Concepts
27 pages
Matinf 2360 Part 3
No ratings yet
Matinf 2360 Part 3
106 pages
2.2 Optimized Search Algorithm
No ratings yet
2.2 Optimized Search Algorithm
41 pages
Optimization Techniques in Data Science
No ratings yet
Optimization Techniques in Data Science
18 pages
Efficient Methods in Optimization
No ratings yet
Efficient Methods in Optimization
159 pages
Cmo U1
No ratings yet
Cmo U1
7 pages
1 Opration Research: 1.1 What's Operations Research?
No ratings yet
1 Opration Research: 1.1 What's Operations Research?
56 pages
Introduction to Convex Optimization
No ratings yet
Introduction to Convex Optimization
32 pages
Introduction To Optimization: Lee T. Ostrom, PH.D., CSP, CPE, CTM 208 7575427
No ratings yet
Introduction To Optimization: Lee T. Ostrom, PH.D., CSP, CPE, CTM 208 7575427
82 pages
15 Optimization Script
No ratings yet
15 Optimization Script
62 pages
Op Tim Ization
No ratings yet
Op Tim Ization
36 pages
Concept of Optimization
No ratings yet
Concept of Optimization
34 pages
Introduction to Optimization Concepts
No ratings yet
Introduction to Optimization Concepts
12 pages
Optimization Models Overview
No ratings yet
Optimization Models Overview
18 pages
Optimisation and Optimal Control
No ratings yet
Optimisation and Optimal Control
82 pages
AI1 2024 W09S3H Optimisation
No ratings yet
AI1 2024 W09S3H Optimisation
29 pages
Convex Lecture 1
No ratings yet
Convex Lecture 1
22 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Optimization and Topography
No ratings yet
Optimization and Topography
70 pages
Math 273a: Optimization: Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015
No ratings yet
Math 273a: Optimization: Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015
17 pages
Numerical Analysis for Students
No ratings yet
Numerical Analysis for Students
49 pages
7 Unnamed 11 04 2023
No ratings yet
7 Unnamed 11 04 2023
135 pages
Optimization
No ratings yet
Optimization
22 pages
Ot U1
No ratings yet
Ot U1
55 pages
Lecture 1 Introduction PDF
No ratings yet
Lecture 1 Introduction PDF
29 pages
MIT15 053S13 Lec1
No ratings yet
MIT15 053S13 Lec1
36 pages
MIT15 053S13 Lec1 PDF
No ratings yet
MIT15 053S13 Lec1 PDF
36 pages
Univariate Optimization and Steepest Descent
No ratings yet
Univariate Optimization and Steepest Descent
67 pages
Mathematical Optimization
No ratings yet
Mathematical Optimization
11 pages
Understanding Optimization Basics
No ratings yet
Understanding Optimization Basics
12 pages
Linear and Nonlinear Programming Basics
No ratings yet
Linear and Nonlinear Programming Basics
28 pages
1
No ratings yet
1
31 pages
Optimizacion (Ingles)
No ratings yet
Optimizacion (Ingles)
133 pages
Optimization Methods Overview
No ratings yet
Optimization Methods Overview
7 pages
Introduction To Optimization: Class Notes On: Mathematical Foundations in Engineering, ECEG 6209
No ratings yet
Introduction To Optimization: Class Notes On: Mathematical Foundations in Engineering, ECEG 6209
34 pages
Linear Programming for Optimization Problems
No ratings yet
Linear Programming for Optimization Problems
29 pages
Numerical Optimization Course Notes
No ratings yet
Numerical Optimization Course Notes
96 pages
Sample 7580
0% (1)
Sample 7580
11 pages
Introduction To Optimization: Historical Development
No ratings yet
Introduction To Optimization: Historical Development
5 pages
OR - Theory With Examples
No ratings yet
OR - Theory With Examples
145 pages
Unit-III, Content
No ratings yet
Unit-III, Content
40 pages
Optimization Course Overview
No ratings yet
Optimization Course Overview
36 pages
Introduction To Optimization in Economics
No ratings yet
Introduction To Optimization in Economics
65 pages
Introduction To Optimization
No ratings yet
Introduction To Optimization
51 pages
Op Tim Ization
No ratings yet
Op Tim Ization
6 pages
Short Introduction To Optimization MIT
No ratings yet
Short Introduction To Optimization MIT
16 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
Module 2
No ratings yet
Module 2
92 pages
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
146 pages
1.2 Hri 2
No ratings yet
1.2 Hri 2
11 pages
To Print - Dynprog2
No ratings yet
To Print - Dynprog2
46 pages
Dynamic Programming for LCS
No ratings yet
Dynamic Programming for LCS
32 pages
To Read Dynprog2
No ratings yet
To Read Dynprog2
50 pages
N Queen
No ratings yet
N Queen
32 pages
N Queen
No ratings yet
N Queen
35 pages
Backtracking
No ratings yet
Backtracking
13 pages
MCQ On Linear Programming Problem
82% (11)
MCQ On Linear Programming Problem
7 pages
Control Systems for Engineers
No ratings yet
Control Systems for Engineers
9 pages
u= x−x h v (v+1) ∇ y …+v (v+1) … v +n−1) n! ∇ y v= x−x h: Δ y y y i=0,1,2, …, n−1 ∇ y y y i=n, n−1, …, 1
No ratings yet
u= x−x h v (v+1) ∇ y …+v (v+1) … v +n−1) n! ∇ y v= x−x h: Δ y y y i=0,1,2, …, n−1 ∇ y y y i=n, n−1, …, 1
2 pages
Optimizing Fine-Tuning For LLMs With Cost-Effective Strategies
No ratings yet
Optimizing Fine-Tuning For LLMs With Cost-Effective Strategies
12 pages
Euler and Hamiltonian Paths
100% (1)
Euler and Hamiltonian Paths
5 pages
Data Mining Practical File by Kashish Madan
No ratings yet
Data Mining Practical File by Kashish Madan
40 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
25 pages
AIML Exp 9
No ratings yet
AIML Exp 9
8 pages
Mth744u Exam 2013
No ratings yet
Mth744u Exam 2013
3 pages
Syllabus-Project Report Writing
No ratings yet
Syllabus-Project Report Writing
2 pages
MONAI DenseNet121 for Weed Detection
No ratings yet
MONAI DenseNet121 for Weed Detection
2 pages
Graphing Linear Equations Worksheet
No ratings yet
Graphing Linear Equations Worksheet
3 pages
Modelling, Simulation & Analysis - Lecture 1
No ratings yet
Modelling, Simulation & Analysis - Lecture 1
16 pages
B Tech Artificial Intelligence 4th Year Sem VII A Y 2024-25 K7hB5ECMFy
No ratings yet
B Tech Artificial Intelligence 4th Year Sem VII A Y 2024-25 K7hB5ECMFy
15 pages
CH V Evaluation Some Basics of Impact of Evaluation Methodologies
No ratings yet
CH V Evaluation Some Basics of Impact of Evaluation Methodologies
19 pages
Data Structures: Tree Concepts & Types
No ratings yet
Data Structures: Tree Concepts & Types
69 pages
Cryptocurrency Price Prediction Models
No ratings yet
Cryptocurrency Price Prediction Models
26 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
To Design and Implement Application For Bank Customer Churning Rate Prediction and Analysis Using Machine Learning Algorithm
No ratings yet
To Design and Implement Application For Bank Customer Churning Rate Prediction and Analysis Using Machine Learning Algorithm
4 pages
2018-19 ACI - Mid Semester Make-Up
No ratings yet
2018-19 ACI - Mid Semester Make-Up
3 pages
Calculus Practice for Students
No ratings yet
Calculus Practice for Students
6 pages
Department of Electronics and Communication Engineering: EC8491 - Communication Theory Unit III - MCQ Bank
No ratings yet
Department of Electronics and Communication Engineering: EC8491 - Communication Theory Unit III - MCQ Bank
9 pages
Ai QB
No ratings yet
Ai QB
15 pages
ETC2410 - Revise With ESSA 2025 Statistics Formulas
No ratings yet
ETC2410 - Revise With ESSA 2025 Statistics Formulas
7 pages
Inbound 2282305344552100986
No ratings yet
Inbound 2282305344552100986
14 pages
Humphreys PM Using Ev Content 20120201
No ratings yet
Humphreys PM Using Ev Content 20120201
44 pages
NoCA2019-ProxyML 2019nov29
No ratings yet
NoCA2019-ProxyML 2019nov29
24 pages
Simplex Method: Example (All Constraints Are )
No ratings yet
Simplex Method: Example (All Constraints Are )
14 pages
Understanding Prompt Engineering in AI
No ratings yet
Understanding Prompt Engineering in AI
7 pages
Cluster Analysis in DNA Microarray Experiments: Sandrine Dudoit and Robert Gentleman
No ratings yet
Cluster Analysis in DNA Microarray Experiments: Sandrine Dudoit and Robert Gentleman
48 pages

Module 4

Uploaded by

Module 4

Uploaded by

BCSE352E- Essentials of

• Variants of gradient descent

• Optimization is the process of maximizing or minimizing a real

• It refers to usage of specific methods to determine the best solution

• Based on the type of objective function, constraints and decision variables,

• An optimization can be linear or non-linear, convex or non-convex, iterative or

• Maxima is the largest and Minima

Stable Learning Rates (Quadratic)

(li - eigenvalue of A) Eigenvalues

Example2 2   0.851     0.526  

Gradient descent equation is given by

• Gradient Descent Algorithm iteratively

η which scales the gradient and thus controls

Gradient descent optimization algorithms-

Variants of gradient descent

Variants of Gradient descent optimization

You might also like