0% found this document useful (0 votes)
9 views24 pages

Unit 5

Uploaded by

Sunil Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views24 pages

Unit 5

Uploaded by

Sunil Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Unit-5

(Advanced Optimization Solutions): Challenges in Gradient-Based


Optimization for Machine Learning, Adjusting First-Order Derivatives for
Descent, Newton Method, Newton Methods in Machine Learning, Newton
Method: Challenges and Solutions, Computational Efficient Variations of
Newton Method, Non-Differential Optimization Functions.
Challenges in Gradient-Based Optimization for Machine Learning

Nonlinear Regression algorithms, which fit curves that are not linear in their parameters to data,
are a little more complicated, because, unlike linear Regression problems, they can’t be solved
with a deterministic method. Instead, the nonlinear Regression algorithms implement some
kind of iterative minimization process,
Optimum Learning Rate: We must choose an optimum learning rate value.
Constant Learning Rate: All the parameters have a constant learning rate, but there may be
some parameters that we do not want to change at the same rate.
Adjusting First-Order Derivatives for Descent
This method is used to reduce the fluctuations in the stochastic gradient descent algorithm by
helping it move in the relevant direction. This is done by adding the update vector of the last
step. The step factor is further multiplied by another constant that is usually set to a value close
to 0.9.
Using Newton Method Determine the Optimal Solution
Newton Method: Challenges and Solutions

Newton's method has a favorable convergence rate, but its complexity can be higher
compared to methods like gradient descent. The reason for this could be:
•Computational Cost per Iteration: Each iteration of Newton's method requires the
computation of both the gradient and the Hessian of the function. For functions with a
large number of variables, computing the Hessian can be computationally expensive,
especially if it's dense.
•Storage Requirements: Storing and manipulating the Hessian matrix can be memory-
intensive, especially for functions with a large number of variables. This can become a
bottleneck for high-dimensional optimization problems.
•Numerical Stability: The numerical computation of the Hessian can introduce errors,
especially if the function has regions of high curvature or ill-conditioned Hessian matrices.
Ensuring numerical stability adds computational overhead.
Some solutions to these challenges include:
•Using successive over-relaxation to stabilize Newton's method
•Using the secant method, which is slower than Newton's method but can be used when the
derivative is approximated
•Reviewing the assumptions made in the proof of quadratic convergence before implementing
Newton's method
Non-Differential Optimization Functions
• Non-differentiable optimization is a category of optimization that deals with objective that
for a variety of reasons is non differentiable and thus non-convex.
• The functions in this class of optimization are generally non-smooth.
• These functions although continuous often contain sharp points or corners that do not allow
for the solution of a tangent and are thus non-differentiable.
Sub-gradient
Method
The subgradient method is an
optimization technique used for
minimizing non-smooth convex
functions. It is particularly useful
when the objective function is non-
differentiable. Subgradients are
almost identical to simple gradients
since the convex functions are
differentiable at any point,
Assumptions: The function is convex
1.Initialization: Start with initial point x0​and step size t0
2.Iterative Update: At each iteration k:
1.Compute the subgradient gk​of the function f(x) at the current point xk
2.Update the current point using formula
3.Update the step size tk according to a predefined rule
3.Convergence Check: Repeat iteration updates until a stopping criterion is
met. This can be when there is a maximum number of iterations, or a
sufficiently small change in function value
Unit 6
Primal Gradient Decent Methods Lagrangian Relaxation and Duality,.
Machine Learning applications.

You might also like