0% found this document useful (0 votes)
19 views24 pages

04 23ECE216 OptimizationWithHessians

The document discusses optimization methods for functions of two variables, focusing on necessary and sufficient conditions for identifying local maxima and minima using the gradient and Hessian matrix. It explains how to classify stationary points based on the eigenvalues of the Hessian and provides examples to illustrate these concepts. Additionally, it covers the definitions of convexity and concavity in relation to the Hessian's eigenvalues.

Uploaded by

pvsbym
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views24 pages

04 23ECE216 OptimizationWithHessians

The document discusses optimization methods for functions of two variables, focusing on necessary and sufficient conditions for identifying local maxima and minima using the gradient and Hessian matrix. It explains how to classify stationary points based on the eigenvalues of the Hessian and provides examples to illustrate these concepts. Additionally, it covers the definitions of convexity and concavity in relation to the Hessian's eigenvalues.

Uploaded by

pvsbym
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Om Namo Bhagavate Vasudevaya

23ECE216 Machine
Learning

Optimizing functions of many variables

Dr. Binoy B Nair (Compiled from Optimization


Methods by D Nagesh Kumar, IISc)

1
Necessary conditions for two variable
optimization
𝜕𝑓 𝜕𝑓
➢ = 0; = 0 at the stationary points.
𝜕𝑥1 𝜕𝑥2

➢i.e. the gradient vector of f(X), ∇𝑥 𝑓 at X = X* = [x1 , x2]


defined as follows, must equal zero:

𝜕𝑓
(Χ ∗)
𝜕𝑥1
∇𝑥 𝑓 = =0
𝜕𝑓
(Χ ∗)
𝜕𝑥2

This is the necessary condition.


2
Sufficient conditions
➢ Consider the following second order derivatives:

𝜕2𝑓 𝜕2𝑓 𝜕2𝑓


; ;
𝜕𝑥12 𝜕𝑥22 𝜕𝑥1 𝜕𝑥2

➢ The Hessian matrix defined by H is made using the above second


order derivatives

𝜕2𝑓 𝜕2𝑓
𝜕𝑥12 𝜕𝑥1 𝜕𝑥2
𝐇=
𝜕2𝑓 𝜕2𝑓
𝜕𝑥1 𝜕𝑥2 𝜕𝑥22 [𝑥1 ,𝑥2 ]

3
Sufficient conditions …contd.
➢The value of determinant of the H is calculated and
➢if H is positive definite then the point X = [x1, x2]
is a point of local minima.
➢if H is negative definite then the point X = [x1, x2]
is a point of local maxima.
➢if H is neither then the point X = [x1, x2] is neither
a point of maxima nor minima.

4
Reminder
• A square matrix is called positive definite if it is
symmetric (i.e. AT =A and all its eigenvalues λ are
positive, that is λ > 0.

• A square matrix is called negative definite if it is


symmetric (i.e. AT =A and all its eigenvalues λ are
negative, that is λ < 0.

5
Example
Consider the function 𝑓(𝐗) = 2𝑥13 /3 − 2𝑥1 𝑥2 − 5𝑥1 + 2𝑥22 + 4𝑥2 + 5
Locate the stationary points of f(X) and classify them as relative
maxima, relative minima or neither.

Solution
𝑓(𝐗) = 2𝑥13 /3 − 2𝑥1 𝑥2 − 5𝑥1 + 2𝑥22 + 4𝑥2 + 5

𝜕𝑓 𝑋 ∗
𝜕𝑥1 2𝑥12 − 2𝑥2 − 5 0
∇𝑥 𝑓 = = =
𝜕𝑓 𝑋 ∗ −2𝑥1 + 4𝑥2 + 4 0
𝜕𝑥2

6
Example …contd.
Solution contd..

From 𝜕𝑥
𝜕𝑓
(X) = 0,
1

8𝑥22 + 14𝑥2 + 3 = 0
(2𝑥2 + 3)(4𝑥2 + 1) = 0

𝑥2 = −3/2 or 𝑥2 = −1/4
Substitute these values in the previous equations to get two
corresponding values for x1
So the two stationary points are
X1 = [-1,-3/2] and X2 = [3/2,-1/4]

7
Example …contd.
Solution Contd..
2 f 2 f 2 f 2 f
= 4 x1 ; 2 = 4; = = −2
x12
x2 x1x2 x2x1

The Hessian of f(X) is


 4 x −2 
H= 1 
 −2 4 

 − 4 x1 2
I - H =
2  −4
+4 2
At X1 = [-1,-3/2] ,  I - H = 2  −4
= ( + 4)( − 4) − 4 = 0

Since one eigen value is positive


 −16 − 4 = 0
2
and one negative, X1 is neither
a relative maximum nor a
𝜆1 = + 20, 𝜆2 = − 20
relative minimum
8
Example …contd.
Solution Contd..

At X2 = [3/2,-1/4]
 −6 2
I - H = = ( − 6)( − 4) − 4 = 0
2  −4

1 = 5 + 5 2 = 5 − 5

Since both the eigen values are positive, X2 is a local minimum.


Minimum value of f(x) is -0.375

9
Example
Maximize 𝑓 𝑿 = 20 + 2𝑥1 − 𝑥12 + 6𝑥2 − 3𝑥22 /2

Solution

𝜕𝑓
(Χ ∗)
𝜕𝑥1 2 − 2𝑥1 0
∇𝑥 𝑓 = = =
𝜕𝑓 6 − 3𝑥2 0 X* = [1,2]
(Χ ∗)
𝜕𝑥2

2 f 2 f 2 f  −2 0 
= −2; 2 = −3; =; 0 H= 
x12
x2 x1x2  0 −3 

10
Example …contd.

+2 0
I - H = = ( + 2)( + 3) = 0
0  +3
1 = −2 and 2 = −3

Since both the Eigen values are negative, f(X) is concave


with a global maximum of f(X) = 27

11
Functions of two variables
• A function of two variables, f(X) where X is a vector = [x1,x2], is strictly
convex if

f (t 1 + (1 − t )2 )  tf (1 ) + (1 − t ) f (2 )

• where X1 and X2 are points located by the coordinates given in their


respective vectors.

• Similarly a two variable function is strictly concave if

f (t 1 + (1 − t )2 )  tf (1 ) + (1 − t ) f (2 )

12
Contour plot of a convex function

13
Contour plot of a concave function

14
Sufficient conditions
• To determine convexity or concavity of a function of
multiple variables, the Eigen values of its Hessian matrix is
examined and the following rules apply.
• If all Eigen values of the Hessian are positive the function is
strictly convex.
• If all Eigen values of the Hessian are negative the function is
strictly concave.
• If some Eigen values are positive and some are negative, or if
some are zero, the function is neither strictly concave nor
strictly convex.

15
Example
Locate the stationary points of f(X) and find out if the function is
convex, concave or neither at the points of optima.

f ( X) = 2 x13 / 3 − 2 x1 x2 − 5 x1 + 2 x22 + 4 x2 + 5

Solution:

𝜕𝑓 𝑋 ∗
𝜕𝑥1 2𝑥12 − 2𝑥2 − 5 0
∇𝑥 𝑓 = = =
𝜕𝑓 𝑋 ∗ −2𝑥1 + 4𝑥2 + 4 0
𝜕𝑥2

3 3 1
𝑋1 = [−1, − ] 𝑋2 = [ , − ]
2 2 4
16
The Hessian is calculated as follows:

𝜕2 𝑓 𝑋 𝜕2 𝑓 𝑋 𝜕2 𝑓 𝑋 𝜕2 𝑓 𝑋
= 4𝑥1 , = 4, = = −2
𝜕𝑥12 𝜕𝑥22 𝜕𝑥1 𝑥2 𝜕𝑥2 𝑥1

4𝑥1 −2
𝐻=
−2 4

𝜆 − 4𝑥1 2
𝜆𝐼 − 𝐻 = =0
2 𝜆−4

i.e. at 𝑋1

𝜆+4 2
𝜆𝐼 − 𝐻 = = 𝜆+4 𝜆−4 −2∗2=0
2 𝜆−4
𝜆2 − 16 − 4 = 0
𝜆2 = 20 or𝜆 = + 20, − 20

Since one eigen value is positive and another negative, the point 𝑋1 is a saddle
point.

17
Example (contd..)

i.e. at 𝑋2

𝜆−6 2
𝜆𝐼 − 𝐻 = = 𝜆−6 𝜆−4 −2∗2=0
2 𝜆−4
2
𝜆 − 10𝜆 + 24 − 4 = 0
𝜆2 − 10𝜆 + 20 = 0
𝜆 = 5 + 5, 5 − 5

Since both eigen values are positive the point 𝑋2 is a local


minimum and the function is convex at this point .

18
Necessary condition

• In case of multivariable functions a necessary condition for a


stationary point of the function f(X) is that each partial derivative is
equal to zero.
• In other words, each element of the gradient vector defined below
must be equal to zero. i.e. the gradient vector of f(X), ∇𝑥 𝑓 at X=X*,
defined as follows, must be equal to zero:

𝜕𝑓 ∗
(Χ )
𝜕𝑥1
𝜕𝑓 ∗
(Χ )
∇𝑥 𝑓 = 𝜕𝑥2 =0


𝜕𝑓
(Χ ∗ )
𝜕𝑑𝑥𝑛
19
Sufficient condition
➢ For a stationary point X* to be an extreme point, the matrix of
second partial derivatives (Hessian matrix) of f(X) evaluated at X*
must be:
➢ positive definite when X* is a point of relative minimum, and
➢ negative definite when X* is a relative maximum point.

➢ When all eigen values are negative for all possible values of X,
then X* is a global maximum, and when all eigen values are
positive for all possible values of X, then X* is a global minimum.

➢ If some of the eigen values of the Hessian at X* are positive and


some negative, or if some are zero, the stationary point, X*, is
neither a local maximum nor a local minimum.

20
Example
Analyze the function 𝑓(𝑥) = −𝑥12 − 𝑥22 − 𝑥32 + 2𝑥1𝑥2 + 2𝑥1𝑥3 + 4𝑥1 − 5𝑥3 + 2 and
classify the stationary points as maxima, minima and points of
inflection.

Solution

𝜕𝑓 ∗
(Χ )
𝜕𝑥1
−2𝑥1 + 2𝑥2 + 2𝑥3 + 4 0
𝜕𝑓 ∗
∇𝑥 𝑓 = (Χ ) = −2𝑥2 + 2𝑥1 = 0
𝜕𝑥2 −2𝑥3 + 2𝑥1 − 5 0
𝜕𝑓 ∗
(Χ )
𝜕𝑥3

21
Example …contd.

22
Example …contd.
Hessian of f(X) is:

𝜕2 𝑓
𝐻=
𝜕𝑥𝑖 𝜕𝑥𝑗

−2 2 2
𝐻 = 2 −2 0
2 0 −2

𝜆 + 2 −2 −2
𝜆𝐼 − 𝐻 = −2 𝜆 + 2 0 =0
−2 0 𝜆+2

Solving this results in 𝜆 = −2, −2 2, 2 2 (hence a saddle point)

23
Thank you

24

You might also like