0% found this document useful (0 votes)

6 views30 pages

Opt Sem2

The document discusses matrix calculus and line search methods in optimization for machine learning. It covers key concepts such as differentials, gradients, Hessians, and various line search techniques including dichotomy search, golden selection, and the Brent method. Examples are provided to illustrate the application of these concepts in finding gradients and optimizing functions.

Uploaded by

Roman Degtyarev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views30 pages

Opt Sem2

Uploaded by

Roman Degtyarev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Matrix calculus. Line search.

Seminar

Optimization for ML. Faculty of Computer Science. HSE University

v § } 1
Theory recap. Differential

• Differential df (x)[·] : U → V in point x ∈ U for f (·) : U → V :

f (x + h) − f (x) = df (x)[h] +o(||h||)

| {z }
differential

U →V R Rn Rn×m
R f ′ (x)dx ∇f (x)dx ∇f (x)dx
Rn ∇f (x)T dx J(x)dx —
Rn×m tr(∇f (X)T dX) — —

Lecture reminder v § } 2
Theory recap. Differential

• Differential df (x)[·] : U → V in point x ∈ U for f (·) : U → V :

f (x + h) − f (x) = df (x)[h] +o(||h||)

| {z }
differential

• Canonical form of the differential:

U →V R Rn Rn×m
R f ′ (x)dx ∇f (x)dx ∇f (x)dx
Rn ∇f (x)T dx J(x)dx —
Rn×m tr(∇f (X)T dX) — —

Lecture reminder v § } 2
Theory recap. Differentiation Rules

• Useful differentiation rules and standard derivatives:

Differentiation Rules Standard Derivatives

dA = 0 d(⟨A, X⟩) = ⟨A, dX⟩
d(αX) = α(dX) d(⟨Ax, x⟩) = ⟨(A + AT )x, dx⟩
d(AXB) = A(dX)B d(Det(X)) = Det(X)⟨X −T , dX⟩
d(X + Y ) = dX + dY d(X −1 ) = −X −1 (dX)X −1
d(X T ) = (dX)T
d(XY ) = (dX)Y + X(dY )
d(⟨X, Y ⟩) =
⟨dX, Y ⟩ + ⟨X, dY ⟩
ϕdX−(dϕ)X
d Xϕ
= ϕ2

Lecture reminder v § } 3
Theory recap. Differential and Gradient / Hessian

We can retrieve the gradient using the following formula:

df (x) = ⟨∇f (x), dx⟩

Lecture reminder v § } 4
Theory recap. Differential and Gradient / Hessian

We can retrieve the gradient using the following formula:

df (x) = ⟨∇f (x), dx⟩

Then, if we have a differential of the above form and we need to calculate the second derivative of the matrix/vector
function, we treat “old” dx as the constant dx1 , then calculate d(df ) = d2 f (x)

Lecture reminder v § } 4
Theory recap. Differential and Gradient / Hessian

We can retrieve the gradient using the following formula:

df (x) = ⟨∇f (x), dx⟩

d2 f (x) = ⟨∇2 f (x)dx1 , dx⟩ = ⟨Hf (x)dx1 , dx⟩

Lecture reminder v § } 4
Theory recap. Line Search

• Solution localization methods:

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:

• Dichotomy search method

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:

• Dichotomy search method
• Golden selection search method

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:

• Dichotomy search method
• Golden selection search method
• Inexact line search:

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:

• Dichotomy search method
• Golden selection search method
• Inexact line search:
• Sufficient decrease

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:

• Dichotomy search method
• Golden selection search method
• Inexact line search:
• Sufficient decrease
• Goldstein conditions

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:

• Dichotomy search method
• Golden selection search method
• Inexact line search:
• Sufficient decrease
• Goldstein conditions
• Curvature conditions

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:

• Dichotomy search method
• Golden selection search method
• Inexact line search:
• Sufficient decrease
• Goldstein conditions
• Curvature conditions
• The idea behind backtracking line search

Lecture reminder v § } 5
Matrix Calculus. Problem 1

ñ Example

1 T
Find ∇f (x), if f (x) = x Ax + bT x + c.
2

Matrix Derivatives Problems v § } 6

Matrix Calculus. Problem 2

ñ Example

Find ∇f (X), if f (X) = tr(AX −1 B)

Matrix Derivatives Problems v § } 7

Matrix Calculus. Problem 3

ñ Example

Find the gradient ∇f (x) and hessian ∇2 f (x), if f (x) = 31 ∥x∥32

Matrix Derivatives Problems v § } 8

Line Search. Example 1: Comparison of Methods (Colab ♣)
f1 (x) = x(x − 2)(x + 2)2 + 10 Random search: 72 function calls. 36 iterations. f1∗ = 0.09
Binary search: 23 function calls. 13 iterations. f1∗ = 10.00
[a, b] = [−3, 2] Golden search: 19 function calls. 18 iterations. f1∗ = 10.00
Parabolic search: 20 function calls. 17 iterations. f1∗ = 10.00

Figure 1: Comparison of different line search algorithms with f1

Line Search Examples v § } 9

Line Search. Example 1: Comparison of Methods (Colab ♣)
r x2 Random search: 68 function calls. 34 iterations. f2∗ = 0.71
2 x2 e− 8
f2 (x) = − Binary search: 23 function calls. 13 iterations. f2∗ = 0.71
π 8 Golden search: 20 function calls. 19 iterations. f2∗ = 0.71
[a, b] = [0, 6] Parabolic search: 17 function calls. 14 iterations. f2∗ = 0.71

Figure 2: Comparison of different line search algorithms with f2

Line Search Examples v § } 10

Line Search. Example 1: Comparison of Methods (Colab ♣)

x
q Random search: 66 function calls. 33 iterations. f3∗ = 0.25
f3 (x) = sin sin sin Binary search: 32 function calls. 17 iterations. f3∗ = 0.25
2
[a, b] = [5, 70] Golden search: 25 function calls. 24 iterations. f3∗ = 0.25
Parabolic search: 103 function calls. 100 iterations. f3∗ = 0.25