04 23ECE216 OptimizationWithHessians
04 23ECE216 OptimizationWithHessians
23ECE216 Machine
Learning
1
Necessary conditions for two variable
optimization
𝜕𝑓 𝜕𝑓
➢ = 0; = 0 at the stationary points.
𝜕𝑥1 𝜕𝑥2
𝜕𝑓
(Χ ∗)
𝜕𝑥1
∇𝑥 𝑓 = =0
𝜕𝑓
(Χ ∗)
𝜕𝑥2
𝜕2𝑓 𝜕2𝑓
𝜕𝑥12 𝜕𝑥1 𝜕𝑥2
𝐇=
𝜕2𝑓 𝜕2𝑓
𝜕𝑥1 𝜕𝑥2 𝜕𝑥22 [𝑥1 ,𝑥2 ]
3
Sufficient conditions …contd.
➢The value of determinant of the H is calculated and
➢if H is positive definite then the point X = [x1, x2]
is a point of local minima.
➢if H is negative definite then the point X = [x1, x2]
is a point of local maxima.
➢if H is neither then the point X = [x1, x2] is neither
a point of maxima nor minima.
4
Reminder
• A square matrix is called positive definite if it is
symmetric (i.e. AT =A and all its eigenvalues λ are
positive, that is λ > 0.
5
Example
Consider the function 𝑓(𝐗) = 2𝑥13 /3 − 2𝑥1 𝑥2 − 5𝑥1 + 2𝑥22 + 4𝑥2 + 5
Locate the stationary points of f(X) and classify them as relative
maxima, relative minima or neither.
Solution
𝑓(𝐗) = 2𝑥13 /3 − 2𝑥1 𝑥2 − 5𝑥1 + 2𝑥22 + 4𝑥2 + 5
𝜕𝑓 𝑋 ∗
𝜕𝑥1 2𝑥12 − 2𝑥2 − 5 0
∇𝑥 𝑓 = = =
𝜕𝑓 𝑋 ∗ −2𝑥1 + 4𝑥2 + 4 0
𝜕𝑥2
6
Example …contd.
Solution contd..
From 𝜕𝑥
𝜕𝑓
(X) = 0,
1
8𝑥22 + 14𝑥2 + 3 = 0
(2𝑥2 + 3)(4𝑥2 + 1) = 0
𝑥2 = −3/2 or 𝑥2 = −1/4
Substitute these values in the previous equations to get two
corresponding values for x1
So the two stationary points are
X1 = [-1,-3/2] and X2 = [3/2,-1/4]
7
Example …contd.
Solution Contd..
2 f 2 f 2 f 2 f
= 4 x1 ; 2 = 4; = = −2
x12
x2 x1x2 x2x1
− 4 x1 2
I - H =
2 −4
+4 2
At X1 = [-1,-3/2] , I - H = 2 −4
= ( + 4)( − 4) − 4 = 0
At X2 = [3/2,-1/4]
−6 2
I - H = = ( − 6)( − 4) − 4 = 0
2 −4
1 = 5 + 5 2 = 5 − 5
9
Example
Maximize 𝑓 𝑿 = 20 + 2𝑥1 − 𝑥12 + 6𝑥2 − 3𝑥22 /2
Solution
𝜕𝑓
(Χ ∗)
𝜕𝑥1 2 − 2𝑥1 0
∇𝑥 𝑓 = = =
𝜕𝑓 6 − 3𝑥2 0 X* = [1,2]
(Χ ∗)
𝜕𝑥2
2 f 2 f 2 f −2 0
= −2; 2 = −3; =; 0 H=
x12
x2 x1x2 0 −3
10
Example …contd.
+2 0
I - H = = ( + 2)( + 3) = 0
0 +3
1 = −2 and 2 = −3
11
Functions of two variables
• A function of two variables, f(X) where X is a vector = [x1,x2], is strictly
convex if
12
Contour plot of a convex function
13
Contour plot of a concave function
14
Sufficient conditions
• To determine convexity or concavity of a function of
multiple variables, the Eigen values of its Hessian matrix is
examined and the following rules apply.
• If all Eigen values of the Hessian are positive the function is
strictly convex.
• If all Eigen values of the Hessian are negative the function is
strictly concave.
• If some Eigen values are positive and some are negative, or if
some are zero, the function is neither strictly concave nor
strictly convex.
15
Example
Locate the stationary points of f(X) and find out if the function is
convex, concave or neither at the points of optima.
f ( X) = 2 x13 / 3 − 2 x1 x2 − 5 x1 + 2 x22 + 4 x2 + 5
Solution:
𝜕𝑓 𝑋 ∗
𝜕𝑥1 2𝑥12 − 2𝑥2 − 5 0
∇𝑥 𝑓 = = =
𝜕𝑓 𝑋 ∗ −2𝑥1 + 4𝑥2 + 4 0
𝜕𝑥2
3 3 1
𝑋1 = [−1, − ] 𝑋2 = [ , − ]
2 2 4
16
The Hessian is calculated as follows:
𝜕2 𝑓 𝑋 𝜕2 𝑓 𝑋 𝜕2 𝑓 𝑋 𝜕2 𝑓 𝑋
= 4𝑥1 , = 4, = = −2
𝜕𝑥12 𝜕𝑥22 𝜕𝑥1 𝑥2 𝜕𝑥2 𝑥1
4𝑥1 −2
𝐻=
−2 4
𝜆 − 4𝑥1 2
𝜆𝐼 − 𝐻 = =0
2 𝜆−4
i.e. at 𝑋1
𝜆+4 2
𝜆𝐼 − 𝐻 = = 𝜆+4 𝜆−4 −2∗2=0
2 𝜆−4
𝜆2 − 16 − 4 = 0
𝜆2 = 20 or𝜆 = + 20, − 20
Since one eigen value is positive and another negative, the point 𝑋1 is a saddle
point.
17
Example (contd..)
i.e. at 𝑋2
𝜆−6 2
𝜆𝐼 − 𝐻 = = 𝜆−6 𝜆−4 −2∗2=0
2 𝜆−4
2
𝜆 − 10𝜆 + 24 − 4 = 0
𝜆2 − 10𝜆 + 20 = 0
𝜆 = 5 + 5, 5 − 5
18
Necessary condition
𝜕𝑓 ∗
(Χ )
𝜕𝑥1
𝜕𝑓 ∗
(Χ )
∇𝑥 𝑓 = 𝜕𝑥2 =0
⋮
⋮
𝜕𝑓
(Χ ∗ )
𝜕𝑑𝑥𝑛
19
Sufficient condition
➢ For a stationary point X* to be an extreme point, the matrix of
second partial derivatives (Hessian matrix) of f(X) evaluated at X*
must be:
➢ positive definite when X* is a point of relative minimum, and
➢ negative definite when X* is a relative maximum point.
➢ When all eigen values are negative for all possible values of X,
then X* is a global maximum, and when all eigen values are
positive for all possible values of X, then X* is a global minimum.
20
Example
Analyze the function 𝑓(𝑥) = −𝑥12 − 𝑥22 − 𝑥32 + 2𝑥1𝑥2 + 2𝑥1𝑥3 + 4𝑥1 − 5𝑥3 + 2 and
classify the stationary points as maxima, minima and points of
inflection.
Solution
𝜕𝑓 ∗
(Χ )
𝜕𝑥1
−2𝑥1 + 2𝑥2 + 2𝑥3 + 4 0
𝜕𝑓 ∗
∇𝑥 𝑓 = (Χ ) = −2𝑥2 + 2𝑥1 = 0
𝜕𝑥2 −2𝑥3 + 2𝑥1 − 5 0
𝜕𝑓 ∗
(Χ )
𝜕𝑥3
21
Example …contd.
22
Example …contd.
Hessian of f(X) is:
𝜕2 𝑓
𝐻=
𝜕𝑥𝑖 𝜕𝑥𝑗
−2 2 2
𝐻 = 2 −2 0
2 0 −2
𝜆 + 2 −2 −2
𝜆𝐼 − 𝐻 = −2 𝜆 + 2 0 =0
−2 0 𝜆+2
23
Thank you
24