0% found this document useful (0 votes)

5 views44 pages

Lecture 7 Newton

lecture notes

Uploaded by

funtwang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views44 pages

Lecture 7 Newton

lecture notes

Uploaded by

funtwang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

Lecture 7: Second Order Methods

Nicholas Ruozzi
University of Texas at Dallas
Gradient Descent

Gradient Descent Algorithm:

• Pick an initial point
• Iterate until convergence

where is the step size (sometimes called learning rate)

2
Gradient Descent

• At a high level, gradient descent uses the first order Taylor

expansion to approximate the function locally

• Does not take into account the curvature of the function, i.e.,
how quickly the gradient is changing

• This means that it can dramatically overshoot the optimum

with a fixed step size

3
Second Order Methods
• Instead of using only the first derivatives, second order methods
use the first three terms of the multivariate Taylor series
expansion

4
Second Order Methods
• Instead of using only the first derivatives, second order methods
use the first three terms of the multivariate Taylor series
expansion

Sometimes written

5
Second Order Methods
• Instead of using only the first derivatives, second order methods
use the first three terms of the multivariate Taylor series
expansion

6
Second Order Methods
• Instead of using only the first derivatives, second order methods
use the first three terms of the multivariate Taylor series
expansion

• There are a variety of second order methods

• Newton
• Gauss-Newton
• Quasi-Newton
• BFGS
• LBFGS

7
Newton’s Method
• Again, let’s start with the univariate case,
• Newton’s method is a root-finding algorithm, i.e., it seeks a
solution to the equation
• How it works:
• Compute the first order approximation at

• Solve for
• Set

8
Newton’s Method

9
Newton’s Method

10
Newton’s Method

11
Newton’s Method

12
Newton’s Method

13
Newton’s Method
• I thought we were talking about second order methods?
• We are: Newton’s method for optimization applies the previous
strategy to find the zeros of the first derivative
• Approximate:
• Update:
• This is equivalent to minimizing the second order
approximation!

14
Multivariate Newton’s Method
• Update:
• Recall the inverse of a matrix , written , is the matrix such
that , the identity matrix
• Inverses do not always exist, if the inverse doesn’t exist, then
there may be no solution or infinitely many solutions
• Computing the inverse can be expensive: requires
operations for an matrix
• Computing the Hessian matrix itself can be computationally
expensive
• Can converge faster than gradient methods, but is less robust
in general
15
Gradient Descent

with diminishing step size rule

16
Newton’s Method

17
Newton Direction
• If is convex, then the direction specified by Newton’s method is
a descent direction!
• Recall that the Hessian matrix of a convex function is positive
semidefinite everywhere
• A matrix is positive semidefinite if for all
• Newton direction:

18
Newton Direction
• If is convex, then the direction specified by Newton’s method is
a descent direction!
• Recall that the Hessian matrix of a convex function is positive
semidefinite everywhere
• A matrix is positive semidefinite if for all
• Newton direction:

Can be used a stopping criteria:

19
Convergence
• Because Newton’s method specifies a descent direction, we can
use the same kinds of convergence criteria that we used for
gradient descent!
• We could choose different step sizes, use line search
methods, etc. with this direction
• One computational note: the inverse is often not computed
explicitly
• Most numerical methods packages have a special routine to
solve linear systems of the form
• For example, numpy.linalg.solve() in Python

20
Equality Constrained Newton

• Two different approaches:

• One based on duality

• One based on quadratic programming

• Aim is just to make sure that the step taken via Newton’s
method stays inside the constraint set

21
Equality Constrained Newton with Duality

subject to

• Dual problem:

22
Equality Constrained Newton with Duality

subject to

• Dual problem:

This function comes up

frequently in convex
optimization, gets a special
name
23
The Convex Conjugate
• The convex conjugate of a function is defined by

• The convex conjugate is always a convex function, even if is not

a convex function (it is a pointwise supremum of convex
functions)
• The conjugate is a special case of Lagrange duality
• If is convex, then

24
Equality Constrained Newton with Duality

subject to

• Dual problem:

If this function is twice differentiable, we can

apply Newton’s method to solve the dual
25
Equality Constrained Newton via QP/KKT
• Instead of constructing the dual problem, we can directly modify
Newton’s method so that it never takes a step outside the set of
constraints
• Recall that Netwon’s method steps to a minimum of the second
order approximation at a point

26
Equality Constrained Newton via QP/KKT
• Instead of constructing the dual problem, we can directly modify
Newton’s method so that it never takes a step outside the set of
constraints
• Recall that Netwon’s method steps to a minimum of the second
order approximation at a point

• Instead, solve the optimization problem

27
Equality Constrained Newton via QP/KKT
• Pick an initial point such that

• Solve the optimization problem

• Update

28
Equality Constrained Newton via QP/KKT
• Pick an initial point such that

• Solve the optimization problem

• Update

Note that

29
Equality Constrained Newton via QP/KKT

• The solution to this optimization problem can be written in

almost closed form

• As long as there is at least on feasible point, Slater’s

condition implies strong duality holds

• KKT conditions are then necessary and sufficient

30
Equality Constrained Newton via QP/KKT

31
Equality Constrained Newton via QP/KKT

32
Equality Constrained Newton via QP/KKT

Again, existing tools can be

applied to solve these kinds of
linear systems

33
Approximate Newton
• Computing the Hessian matrix is computationally expensive and
may be difficult in closed form (or maybe even not invertible!)
• Idea: can we approximate the Hessian the same way that we
did the derivatives on HW 1, i.e., using the secant method?
• For univariate functions

34
Approximate Newton
• Computing the Hessian matrix is computationally expensive and
may be difficult in closed form (or maybe even not invertible!)
• Idea: can we approximate the Hessian the same way that we
did the derivatives on HW 1, i.e., using the secant method?
• For univariate functions

Use the sequence of iterates to

approximate the 2nd derivative!

35
Approximate Newton
• Computing the Hessian matrix is computationally expensive and
may be difficult in closed form (or maybe even not invertible!)
• Idea: can we approximate the Hessian the same way that we
did the derivatives on HW 1, i.e., using the secant method?
• For multivariate functions

Use the sequence of iterates to

approximate the 2nd derivative!

36
Approximate Newton
• Computing the Hessian matrix is computationally expensive and
may be difficult in closed form (or maybe even not invertible!)
• Idea: can we approximate the Hessian the same way that we
did the derivatives on HW 1, i.e., using the secant method?
• For multivariate functions

Key idea is to replace with a good

approximation that yields equality in
this expression, but is much easier to
compute

37
Approximate Newton
• Computing the Hessian matrix is computationally expensive and
may be difficult in closed form (or maybe even not invertible!)
• Idea: can we approximate the Hessian the same way that we
did the derivatives on HW 1, i.e., using the secant method?
• For multivariate functions

Note that this a system of equations

for an Hessian, so this system is
underdetermined: there could be
many possible substitutions for

38
Quasi-Newton Methods
• Using the previous approximation, the Quasi-Newton methods
seek to generate a series of approximate Hessian matrices such
that the matrix only depends on the matrix and satisfies the
constraint

39
Quasi-Newton Methods
• Using the previous approximation, the Quasi-Newton methods
seek to generate a series of approximate Hessian matrices such
that the matrix only depends on the matrix and satisfies the
constraint

• A wide variety of methods have been proposed to accomplish

this
• The most popular in practice are BFGS and its lower memory
counterpart L-BFGS

40
Broyden-Fletcher-Goldfarb-Shanno (BFGS)

• Choose to be a symmetric positive definite matrix whose

inverse satisfies

and is as small as possible

41
Broyden-Fletcher-Goldfarb-Shanno (BFGS)

such that

This is a convex optimization problem!

(note that the solution may not be
strictly positive definite: if this happens,
reinitialize to be a nice positive definite
matrix like )

42
Broyden-Fletcher-Goldfarb-Shanno (BFGS)

such that

Its solution is... messy...

43
Broyden-Fletcher-Goldfarb-Shanno (BFGS)

such that

Chapter 9 Newton's Method
No ratings yet
Chapter 9 Newton's Method
27 pages
Year 3 Moduleguide 2223
No ratings yet
Year 3 Moduleguide 2223
61 pages
Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
Optimization 2
No ratings yet
Optimization 2
40 pages
14 Newton
No ratings yet
14 Newton
24 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Optim
No ratings yet
Optim
70 pages
The Levenberg-Marquardt Algorithm: Ananth Ranganathan 8th June 2004
No ratings yet
The Levenberg-Marquardt Algorithm: Ananth Ranganathan 8th June 2004
5 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
Levenberg Marquardt Algorithm
100% (5)
Levenberg Marquardt Algorithm
5 pages
Newton Gauss Method
No ratings yet
Newton Gauss Method
37 pages
Newton-Raphson Optimization: Steve Kroon
No ratings yet
Newton-Raphson Optimization: Steve Kroon
4 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
Lecture 7 (With Notes)
No ratings yet
Lecture 7 (With Notes)
39 pages
HW 3 Unconstrained-Optimization Advanced
No ratings yet
HW 3 Unconstrained-Optimization Advanced
9 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
Opt Lec 10
No ratings yet
Opt Lec 10
16 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
No ratings yet
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
15 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Chapter 6vh
No ratings yet
Chapter 6vh
12 pages
Lec3 Gradient Based Method Part I
No ratings yet
Lec3 Gradient Based Method Part I
30 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Lecture12
No ratings yet
Lecture12
6 pages
Chương 9
No ratings yet
Chương 9
12 pages
Structural and Multidisciplinary Optimization
No ratings yet
Structural and Multidisciplinary Optimization
33 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Numerical Experiments With Variations of The Gauss-Newton Algorithm For Nonlinear Least Squares
No ratings yet
Numerical Experiments With Variations of The Gauss-Newton Algorithm For Nonlinear Least Squares
17 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
OPTFIT Aflevering
No ratings yet
OPTFIT Aflevering
9 pages
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
No ratings yet
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
3 pages
Nonlinear Programming (Concepts, Algorithms, and Applications To Chemical Processes) - 3. Newton-Type Methods For Unconstrained Optimization (2010)
No ratings yet
Nonlinear Programming (Concepts, Algorithms, and Applications To Chemical Processes) - 3. Newton-Type Methods For Unconstrained Optimization (2010)
23 pages
8.6 SecondOrder
No ratings yet
8.6 SecondOrder
14 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
Clnote Oct12
No ratings yet
Clnote Oct12
25 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Sequential Quadratic Programming
No ratings yet
Sequential Quadratic Programming
50 pages
Exam 2018
No ratings yet
Exam 2018
18 pages
Iterative Reweighted Least Squares: Sargur N. Srihari
No ratings yet
Iterative Reweighted Least Squares: Sargur N. Srihari
22 pages
ECOM 6302: Engineering Optimization: Chapter Three
100% (1)
ECOM 6302: Engineering Optimization: Chapter Three
56 pages
BFGS
No ratings yet
BFGS
9 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Lecture 14 From Sensitivities To Optimisation
No ratings yet
Lecture 14 From Sensitivities To Optimisation
20 pages
Conjugate Gradient Method
No ratings yet
Conjugate Gradient Method
50 pages
Newton Scribed
No ratings yet
Newton Scribed
7 pages
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
No ratings yet
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
13 pages
Ens 409
No ratings yet
Ens 409
1 page
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Quantum Bits (Qubits)
From Everand
Quantum Bits (Qubits)
Dar’Sean Raymond White Johnson
No ratings yet
Exercises of Numerical Analysis
From Everand
Exercises of Numerical Analysis
Simone Malacrida
No ratings yet
Exercises of Ordinary Differential Equations
From Everand
Exercises of Ordinary Differential Equations
Simone Malacrida
No ratings yet
Intro Perceptron
No ratings yet
Intro Perceptron
70 pages
A Steepest Descent Method For Vector Optimization
No ratings yet
A Steepest Descent Method For Vector Optimization
20 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
The Levenberg Marquardt Algorithm For
No ratings yet
The Levenberg Marquardt Algorithm For
23 pages
Differentiable Simulation For Physical System Identification
No ratings yet
Differentiable Simulation For Physical System Identification
8 pages
The Levenberg-Marquardt Method For Nonlinear Least Squares Curve-Fitting Problems
No ratings yet
The Levenberg-Marquardt Method For Nonlinear Least Squares Curve-Fitting Problems
15 pages
A Hybrid Artificial Bee Colony Algorithmic Approach For Classification Using Neural Networks
No ratings yet
A Hybrid Artificial Bee Colony Algorithmic Approach For Classification Using Neural Networks
24 pages
Cost Function of Logistic Regression
No ratings yet
Cost Function of Logistic Regression
6 pages
Gan Opc
No ratings yet
Gan Opc
13 pages
cs229 2
No ratings yet
cs229 2
275 pages
Survey AI
No ratings yet
Survey AI
17 pages
MP1 v01
No ratings yet
MP1 v01
3 pages
UNIT1
No ratings yet
UNIT1
38 pages
MCQs On Calculus
No ratings yet
MCQs On Calculus
11 pages
Midem ML Makeup Sol Upated
No ratings yet
Midem ML Makeup Sol Upated
6 pages
Machine Learning
No ratings yet
Machine Learning
135 pages
Deep Learning - AD3501 - Important Question and 2 Marks With Answers - Unit 1
No ratings yet
Deep Learning - AD3501 - Important Question and 2 Marks With Answers - Unit 1
13 pages
Lecture 1artificial Neural Networks
No ratings yet
Lecture 1artificial Neural Networks
45 pages
Assignment
No ratings yet
Assignment
1 page
A Stein Variational Newton Method: Preprint. Work in Progress
No ratings yet
A Stein Variational Newton Method: Preprint. Work in Progress
14 pages
Week 04
No ratings yet
Week 04
101 pages
Stock Market Analysis Using Supervised Machine Learning
No ratings yet
Stock Market Analysis Using Supervised Machine Learning
4 pages
Untitledfff
No ratings yet
Untitledfff
40 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
AI Ass 01 813193761
No ratings yet
AI Ass 01 813193761
2 pages
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
41 pages
Second-Order Network For Decentralized Frequency Control of Multi-Microgrid Systems
No ratings yet
Second-Order Network For Decentralized Frequency Control of Multi-Microgrid Systems
12 pages

Lecture 7 Newton

Uploaded by

Lecture 7 Newton

Uploaded by

Lecture 7: Second Order Methods

Gradient Descent Algorithm:

where is the step size (sometimes called learning rate)

• At a high level, gradient descent uses the first order Taylor

• This means that it can dramatically overshoot the optimum

• There are a variety of second order methods

with diminishing step size rule

Can be used a stopping criteria:

• Two different approaches:

• One based on duality

• One based on quadratic programming

This function comes up

• The convex conjugate is always a convex function, even if is not

If this function is twice differentiable, we can

• Instead, solve the optimization problem

• Solve the optimization problem

• Solve the optimization problem

• The solution to this optimization problem can be written in

• As long as there is at least on feasible point, Slater’s

• KKT conditions are then necessary and sufficient

Again, existing tools can be

Use the sequence of iterates to

Use the sequence of iterates to

Key idea is to replace with a good

Note that this a system of equations

• A wide variety of methods have been proposed to accomplish

• Choose to be a symmetric positive definite matrix whose

and is as small as possible

This is a convex optimization problem!

Its solution is... messy...

You might also like