Opt Lec 10
Opt Lec 10
1 / 16
Conditions for Local Maximizers/ Minimizers
Consider the problem finding x ∈ D such that f (x) attains its maximum
(minimum) where f (x) is continuously differentiable to the second-order.
This problem is a unconstrained optimization problem if an optimal
solution x∗ is an interior point of D or x∗ ∈ intD.
First-order necessary condition: x∗ is local maximizer (minimizer)
of f (x), then ∇f (x∗ ) = 0.
Second-order necessary condition: x∗ is local maximizer
(minimizer) f (x) then the Hessian matrix H(x∗ ) of f (x) at x∗ is
negative (positive) semidefinite.
Sufficient condition (for x∗ to be local maximizer): If ∇f (x∗ ) = 0
and the Hessian matrix H(x∗ ) is negative (positive) semidefinite then
f (x) attains its local maximum (minimum) at x∗ .
2 / 16
The training problem
1X n
min ℓ(hw (xx i ), y i ) + λR(w
w)
w ∈Rd n i =1
3 / 16
The Training Problem
1X n
min ℓ(hw (xx i ), y i ) + λR(w
w)
w ∈Rd n i =1
n
ℓ(hw (xx i ), y i ): Goodness of fit
X
i =1
λ: Control tradeoff between fit and complexity
λR(w
w ): Penalizes complexity
w ) = ∥w ∥22 ,
R(w ∥w ∥1 , ∥w ∥p , . . .
4 / 16
Gradient Methods
The unconstrained Problem: Find x ∈ Rn such that f (xx ) attains its
minimum where f (xx ) is a second-order continuously differentiable
function.
The level set of f (xx ): {x ∈ Rn : f (xx ) = c }
The gradient of f at x 0 : ∇f (xx 0 ).
By the Taylor theorem, we have:
5 Set x k +1 = x k − αk ∇f (xx k )
6 Assign k := k + 1 and go back 3rd Step
7 Stop the algorithm and conclude that x k is an optimal solution.
6 / 16
Gradient Methods
Proposition
If {x k }∞
k =0
is a steepest decent sequence for a given function f (xx ) and if
∇f (xx k ) ̸= 0 then f (xx k +1 ) < f (xx k ).
7 / 16
Gradient Methods
Examples
Solve these problems using steepest descent algorithm:
a. f (x1 , x2 ) = x12 + x22 starting from x 0 = (1, 2).
x12
b. f (x1 , x2 ) = + x22 starting from x 0 = (1, 2).
5
c. f (x1 , x2 ) = x1 + 12 x2 + 12x12 + x22 + 3 with starting point x 0 = (0, 0).
d. f (xx ) = 4x12 − 4x1 x2 + 2x22 with starting point x 0 = (2, 3)
e. f (xx ) = x12 − 2x1 x2 + 2x22 + 2x1 with starting point x 0 = (0, 0)
8 / 16
Gradient Methods
9 / 16
The Method of Steepest Descent with a Quadratic
Function
∇f (xx ) = Q x − b
x k +1 = x k − αk d k
where
dkTdk
αk = arg min f (xx k − αd k ) = k T
α≥0 d Qd k
10 / 16
Newton’s Method (Newton-Raphson Method)
11 / 16
Newton’s method for a nonlinear equation
Let f : R −→ R be differentiable. We
want to find x satisfying f (x) = 0.
For any x k , let
fL (x k ) = f (x k ) + f ′ (x k )(x − x k )
fL (x k +1 ) = 0 ⇔ f (x k ) + f ′ (x k )(x k +1 − x k ) = 0
fL′ (x k +1 ) = 0 ⇔ f ′ (x k ) + f ′′ (x k )(x k +1 − x k ) = 0
x k +1 := x k − H(xx k )−1d k
Example. Let minimize f (xx ) = x14 + 2x12 x22 + x24 with starting point
x 0 = (1, 1)
15 / 16
Newton’s Method
16 / 16