0% found this document useful (0 votes)
6 views30 pages

Opt Sem2

The document discusses matrix calculus and line search methods in optimization for machine learning. It covers key concepts such as differentials, gradients, Hessians, and various line search techniques including dichotomy search, golden selection, and the Brent method. Examples are provided to illustrate the application of these concepts in finding gradients and optimizing functions.

Uploaded by

Roman Degtyarev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views30 pages

Opt Sem2

The document discusses matrix calculus and line search methods in optimization for machine learning. It covers key concepts such as differentials, gradients, Hessians, and various line search techniques including dichotomy search, golden selection, and the Brent method. Examples are provided to illustrate the application of these concepts in finding gradients and optimizing functions.

Uploaded by

Roman Degtyarev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Matrix calculus. Line search.

Seminar

Optimization for ML. Faculty of Computer Science. HSE University

v § } 1
Theory recap. Differential

• Differential df (x)[·] : U → V in point x ∈ U for f (·) : U → V :

f (x + h) − f (x) = df (x)[h] +o(||h||)


| {z }
differential

U →V R Rn Rn×m
R f ′ (x)dx ∇f (x)dx ∇f (x)dx
Rn ∇f (x)T dx J(x)dx —
Rn×m tr(∇f (X)T dX) — —

Lecture reminder v § } 2
Theory recap. Differential

• Differential df (x)[·] : U → V in point x ∈ U for f (·) : U → V :

f (x + h) − f (x) = df (x)[h] +o(||h||)


| {z }
differential

• Canonical form of the differential:

U →V R Rn Rn×m
R f ′ (x)dx ∇f (x)dx ∇f (x)dx
Rn ∇f (x)T dx J(x)dx —
Rn×m tr(∇f (X)T dX) — —

Lecture reminder v § } 2
Theory recap. Differentiation Rules

• Useful differentiation rules and standard derivatives:

Differentiation Rules Standard Derivatives


dA = 0 d(⟨A, X⟩) = ⟨A, dX⟩
d(αX) = α(dX) d(⟨Ax, x⟩) = ⟨(A + AT )x, dx⟩
d(AXB) = A(dX)B d(Det(X)) = Det(X)⟨X −T , dX⟩
d(X + Y ) = dX + dY d(X −1 ) = −X −1 (dX)X −1
d(X T ) = (dX)T
d(XY ) = (dX)Y + X(dY )
d(⟨X, Y ⟩) =
 ⟨dX, Y ⟩ + ⟨X, dY ⟩
ϕdX−(dϕ)X
d Xϕ
= ϕ2

Lecture reminder v § } 3
Theory recap. Differential and Gradient / Hessian

We can retrieve the gradient using the following formula:

df (x) = ⟨∇f (x), dx⟩

Lecture reminder v § } 4
Theory recap. Differential and Gradient / Hessian

We can retrieve the gradient using the following formula:

df (x) = ⟨∇f (x), dx⟩

Then, if we have a differential of the above form and we need to calculate the second derivative of the matrix/vector
function, we treat “old” dx as the constant dx1 , then calculate d(df ) = d2 f (x)

Lecture reminder v § } 4
Theory recap. Differential and Gradient / Hessian

We can retrieve the gradient using the following formula:

df (x) = ⟨∇f (x), dx⟩

Then, if we have a differential of the above form and we need to calculate the second derivative of the matrix/vector
function, we treat “old” dx as the constant dx1 , then calculate d(df ) = d2 f (x)

d2 f (x) = ⟨∇2 f (x)dx1 , dx⟩ = ⟨Hf (x)dx1 , dx⟩

Lecture reminder v § } 4
Theory recap. Line Search

• Solution localization methods:

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:


• Dichotomy search method

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:


• Dichotomy search method
• Golden selection search method

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:


• Dichotomy search method
• Golden selection search method
• Inexact line search:

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:


• Dichotomy search method
• Golden selection search method
• Inexact line search:
• Sufficient decrease

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:


• Dichotomy search method
• Golden selection search method
• Inexact line search:
• Sufficient decrease
• Goldstein conditions

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:


• Dichotomy search method
• Golden selection search method
• Inexact line search:
• Sufficient decrease
• Goldstein conditions
• Curvature conditions

Lecture reminder v § } 5
Theory recap. Line Search

• Solution localization methods:


• Dichotomy search method
• Golden selection search method
• Inexact line search:
• Sufficient decrease
• Goldstein conditions
• Curvature conditions
• The idea behind backtracking line search

Lecture reminder v § } 5
Matrix Calculus. Problem 1

ñ Example

1 T
Find ∇f (x), if f (x) = x Ax + bT x + c.
2

Matrix Derivatives Problems v § } 6


Matrix Calculus. Problem 2

ñ Example

Find ∇f (X), if f (X) = tr(AX −1 B)

Matrix Derivatives Problems v § } 7


Matrix Calculus. Problem 3

ñ Example

Find the gradient ∇f (x) and hessian ∇2 f (x), if f (x) = 31 ∥x∥32

Matrix Derivatives Problems v § } 8


Line Search. Example 1: Comparison of Methods (Colab ♣)
f1 (x) = x(x − 2)(x + 2)2 + 10 Random search: 72 function calls. 36 iterations. f1∗ = 0.09
Binary search: 23 function calls. 13 iterations. f1∗ = 10.00
[a, b] = [−3, 2] Golden search: 19 function calls. 18 iterations. f1∗ = 10.00
Parabolic search: 20 function calls. 17 iterations. f1∗ = 10.00

Figure 1: Comparison of different line search algorithms with f1

Line Search Examples v § } 9


Line Search. Example 1: Comparison of Methods (Colab ♣)
r x2 Random search: 68 function calls. 34 iterations. f2∗ = 0.71
2 x2 e− 8
f2 (x) = − Binary search: 23 function calls. 13 iterations. f2∗ = 0.71
π 8 Golden search: 20 function calls. 19 iterations. f2∗ = 0.71
[a, b] = [0, 6] Parabolic search: 17 function calls. 14 iterations. f2∗ = 0.71

Figure 2: Comparison of different line search algorithms with f2

Line Search Examples v § } 10


Line Search. Example 1: Comparison of Methods (Colab ♣)
 
x
q  Random search: 66 function calls. 33 iterations. f3∗ = 0.25
f3 (x) = sin sin sin Binary search: 32 function calls. 17 iterations. f3∗ = 0.25
2
[a, b] = [5, 70] Golden search: 25 function calls. 24 iterations. f3∗ = 0.25
Parabolic search: 103 function calls. 100 iterations. f3∗ = 0.25

Figure 3: Comparison of different line search algorithms with f3

Line Search Examples v § } 11


Line Search. Example 2: The Brent Method

• Parabolic Interpolation + Golden Search = Brent


Method

Figure 4: Idea of Brent Method

Line Search Examples v § } 11


Line Search. Example 2: The Brent Method

• Parabolic Interpolation + Golden Search = Brent


Method
• The key idea of the method is to track the value of
the optimized scalar function at six points a, b, x, w,
v, u

Figure 4: Idea of Brent Method

Line Search Examples v § } 11


Line Search. Example 2: The Brent Method

• Parabolic Interpolation + Golden Search = Brent


Method
• The key idea of the method is to track the value of
the optimized scalar function at six points a, b, x, w,
v, u
• [a, b] − localization interval in the current iteration

Figure 4: Idea of Brent Method

Line Search Examples v § } 11


Line Search. Example 2: The Brent Method

• Parabolic Interpolation + Golden Search = Brent


Method
• The key idea of the method is to track the value of
the optimized scalar function at six points a, b, x, w,
v, u
• [a, b] − localization interval in the current iteration
• The pounts x, w and v such that the inequality
f (x) ⩽ f (w) ⩽ f (v) is valid
Figure 4: Idea of Brent Method

Line Search Examples v § } 11


Line Search. Example 2: The Brent Method

• Parabolic Interpolation + Golden Search = Brent


Method
• The key idea of the method is to track the value of
the optimized scalar function at six points a, b, x, w,
v, u
• [a, b] − localization interval in the current iteration
• The pounts x, w and v such that the inequality
f (x) ⩽ f (w) ⩽ f (v) is valid
• u − minimum of a parabola built on points x, w and Figure 4: Idea of Brent Method
v or the point of the golden section of the largest of
the intervals [a, x] [x, b].

Line Search Examples v § } 11


Line Search. Example 2: The Brent Method

A parabola is constructed only if the points x, w and v are


different, and its vertex u∗ is taken as the point u only if
• u∗ ∈ [a, b]

Figure 5: An example of how the Brent Method works

Line Search Examples v § } 11


Line Search. Example 2: The Brent Method

A parabola is constructed only if the points x, w and v are


different, and its vertex u∗ is taken as the point u only if
• u∗ ∈ [a, b]
• u∗ is no more than half the length of the step that
was before the previous one, from the point x

Figure 5: An example of how the Brent Method works

Line Search Examples v § } 11


Line Search. Example 2: The Brent Method

A parabola is constructed only if the points x, w and v are


different, and its vertex u∗ is taken as the point u only if
• u∗ ∈ [a, b]
• u∗ is no more than half the length of the step that
was before the previous one, from the point x
• If the conditions above are not met, then point u is
located from the golden search

Figure 5: An example of how the Brent Method works

Line Search Examples v § } 11


Line Search. Example 2: The Brent Method

A parabola is constructed only if the points x, w and v are


different, and its vertex u∗ is taken as the point u only if
• u∗ ∈ [a, b]
• u∗ is no more than half the length of the step that
was before the previous one, from the point x
• If the conditions above are not met, then point u is
located from the golden search
• Example In Colab ♣

Figure 5: An example of how the Brent Method works

Line Search Examples v § } 11

You might also like