Lecture 09 - Calculus and Optimization Techniques (3) - Plain
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
Constrained Optimization
CS771: Intro to ML
5
Projected Gradient Descent
Consider an optimization problem of the form
If project as =
CS771: Intro to ML
6
Projected GD: How to Project?
Here projecting a point means finding the “closest” point from the constraint
set
Projected GD commonly
used only when the
projection step is simple
and efficient to compute
For some sets
: Unit, the projection
radius ball step is easy : Set of non-negative reals
(0,1)
(1,0)
Projection = Normalize to unit Euclidean length vector Projection = Set each negative entry in to be zero
CS771: Intro to ML
7
Proximal Gradient Descent
Consider minimizing a regularized loss function of the form
Note: The reg. hyperparam.
assumed part of itself
CS771: Intro to ML
9
Constrained Opt. via Lagrangian
The Lagrangian:
Therefore, we can write our original problem as
CS771: Intro to ML
11
CS771: Intro to ML
12
Co-ordinate Descent (CD)
Standard gradient descent update for :
CD: In each iter, update only one entry (co-ordinate) of . Keep all others fixed
Usually converges to a local optima. But very very useful. Will see examples
later CS771: Intro to ML
14
Newton’s Method
Unlike GD and its variants, Newton’s method uses second-order information
(second derivative, a.k.a. the Hessian)
At each point , minimize the quadratic (second-order) approx. of
[]
𝐿(𝒘 )
Show that
CS771: Intro to ML