0% found this document useful (0 votes)

62 views8 pages

The Hessian and Optimization

The document discusses the Hessian matrix and how it relates to the second-order Taylor expansion of a function. The Hessian describes how a function behaves locally as either elliptic, hyperbolic or parabolic based on the eigenvalues of the matrix.

Uploaded by

87zero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views8 pages

The Hessian and Optimization

Uploaded by

87zero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

The Hessian and optimization

Let us start with two dimensions:

Let f (x, y) be a function of two variables. We write the Taylor expansion around
(x0, y0 ). Write the vector h® = hx − x0, y − y0 i. We will consider h® as a column
vector when matrices are involved.
∂ f ∂ f

f (x, y) = f (x0, y0 ) + (x − x0 ) + (y − y0 ) +
∂ x (x0,y0 ) ∂ y (x0,y0 )
| {z }

∇f · h®
(x0,y0 )

1 ∂ 2 f ∂ 2f ∂ 2f

1
+ (x − x0 )2 + (x − x0 )(y − y0 ) + (y − y0 )2 + · · ·
2 ∂ x (x0,y0 )
2 ∂ x∂ y (x0,y0 )
2 ∂ y (x0,y0 )
2
| {z }

1®t
h Hf h®
2 (x0,y0 )

Here the H f is the so called Hessian matrix

" ∂2 f ∂2 f
#
∂ x2 ∂ x∂ y
Hf = ∂2 f ∂2 f .
∂ y∂ x ∂ y2

And as h® is a column vector, recall that h® t is the transpose of h, ® that is, the
corresponding row vector, and so the computation becomes
" ∂2 f ∂2 f #
∂ (x − x )
h® H f h® = (x − x0 ) (y − y0 ) ∂∂2 xf ∂ 2 f 0
2
t x∂ y
.

(y − y 0 )
2 ∂ x∂ y ∂ y

The trick is to understand what do these second derivatives tell us about the
function. The best place to figure out what they mean is at a critical point, that is
a point where ∇ f = 0. For simplicity suppose that x0 = y0 = 0, and f (x0, y0 ) = 0,
and of course ∇ f |(0,0) = 0. Then we have that
1
f (x, y) = h® t H f (x0,y0 ) h® + · · · ,

2
and so f behaves like a quadratic form near by the origin. Suppose that H f is
nondegenerate, that is, suppose that it has no zero eigenvalues. In that case it really
behaves like one of three cases (after a linear change of coordinates).

1
Two positive eigenvalues:
The model case is when the matrix looks like

1 1 0
Hf =
2 0 1
(we write half the matrix for simplicity). The function then becomes x 2 + y 2 , and
then the graph of the form looks like
10 -10 x
-5
y 5 0
5
0
10
-5
200
-10

200 150

150 100

100 50

50 0

10
0
5
-10 0
-5
0 -5 y
5
x 10 -10

Two negative eigenvalues:

The model case is when the matrix looks like

1 −1 0
Hf = .
2 0 −1
The function then becomes −x 2 − y 2 , and then the graph of the form looks like
10 -10 x
-5
y 5 0
5
0
10
-5
0
-10

0 -50

-50 -100

-100 -150

-150 -200

10
-200
5
-10 0
-5
0 -5 y
5
x 10 -10

2
One positive and one negative eigenvalue:
The model case is when the matrix looks like

1 1 0
Hf = .
2 0 −1
The function then becomes x 2 − y 2 , and then the graph of the form looks like
10 -10 x
-5
y 5 0
5
0
10
-5
100
-10

100 50

50 0

0 -50

-50 -100

10
-100
5
-10 0
-5
0 -5 y
5
x 10 -10

Therefore, we if we wish to figure out if a critical point is a minimum or a maximum,

(or a saddle), we look at the Hessian. If it is nondegenerate, we figure out what
sign are the eigenvalues.
Computing eigenvalues is difficult, but there is a simple way to tell the signs.
Notice

1 0 −1 0 1 0
det = 1, det = 1, det = −1.
0 1 0 −1 0 −1
The determinant of a matrix is a product of the eigenvalues. Therefore, if eigen-
values are of opposite signs, determinant of a 2 × 2 matrix is negative. Be careful,
the reasoning is different in 3 × 3, or larger, matrices.
It turns out that if you have two eigenvalues of the same sign, you can decide on
the sign of the eigenvalues of a 2 × 2 matrix by just checking the upper left (or
lower right) number in the matrix. If it is positive, then the matrix had two positive
eigenvalues. One way you can see that is that you notice that if you above set
y = 0, then f (x, 0) is a function of x alone, and its second derivative is the upper

3
left number of the matrix. If it is positive, then the graph of the function must
“curve up”, that is, it must be a minimum. Therefore, there must be at least one
positive eigenvalue.
Let’s see an example. Take
f (x, y) = 3x 2 + 7xy + 5y 2 .
The origin is a critical point, we just need to decide, is it a min/max or a saddle.
The Hessian is

6 7 6 7
Hf = , and so det(H f ) = det = 6 × 10 − 7 × 7 = 11.
7 10 7 10
So the critical point at the origin is either a local minimum or a maximum. Since
the upper left component is 6. It must be that the surface curves upwards, both
eigenvalues are positive and the point is a minimum for f . In fact the eigenvalues
are approximately 0.72 and 15.28, but notice that we didn’t have to check that.
Now suppose
g(x, y) = 3x 2 + 8x y + 5y 2 .
Again the origin is a critical point. The Hessian is

6 8 6 8
Hg = , so det(Hg ) = det = 6 × 10 − 8 × 8 = −4.
8 10 8 10
So the eigenvalues must be of opposite signs, and the surface must be a saddle
point. In fact they are approximately −0.25 and 16.25. Neither min nor max.

Degenerate Hessian
Consider
f (x, y) = x 2 + y 4 .
It is not difficult to convince yourself that this function is always positive except at
the origin, when it is zero. So the origin is a minimum.
However if you compute the Hessian you find

2 0
Hf = , so det(H f ) = 0.
0 0

4
The matrix is degenerate (it must have a zero eigenvalue since the determinant is
zero). So we can’t quite tell from the second derivative.
To see why we can’t tell, try
g(x, y) = x 2 − y 4 .
The graph of this is a saddle (Just notice that g(x, 0) = x 2 and g(0, y) = −y 4 ). If
you compute the Hessian you find

2 0
Hg = , so det(Hg ) = 0.
0 0
The Hessian is the same as for f above, so clearly we cannot read much from
Hg and H f . One thing that we can read off, is that the critical point is definitely
not a maximum. That would require two negative eigenvalues, but one of the
eigenvalues is in fact 2, which is positive (the other is the zero).

Peano’s example
One has to be careful with trying to just check for extrema (min or max) on lines.
Consider
f (x, y) = (y − x 2 )(2x 2 − y).
This function, is always zero on the sets where y = x 2 and y = 2x 2 . In fact, it is
negative outside the space between these two lines and positive between the two
lines.
-10 -5 0 5 10

200 200
x^2
2*x^2

150 150

100 100

50 50

0 0

-10 -5 0 5 10

If we restrict the function to any line, for example if we pick any angle θ, and
consider the line through the origin at the angle θ to the positive real axis. That is,

5
the line given by γ(t) = t cos(θ), t sin(θ) , then

f γ(t) = f t cos(θ), t sin(θ) .

For any θ, f γ(t) has a strict maximum at t = 0. By a strict maximum we mean

that the function is actually strictly less than zero for t near zero. In fact except for
when θ = 0 or θ = π, we find
d2 h i
f γ(t) < 0
dt 2
so it is a maximum (a strict one). When θ = 0 , this corresponds to y = 0, and so
f γ(t) = −2t 4,

which clearly has a strict maximum.

But f is a saddle! It is just a “bent saddle.”
0.50-0.50 x
-0.25
y 0.25 0.00
0.25
0.00
0.50
-0.25
0.0
-0.50

0.0 -0.2

-0.2 -0.4

-0.4 -0.6

-0.6
0.50

0.25
-0.50 0.00
-0.25
0.00 -0.25 y
0.25
x 0.50-0.50

Clearly the function cannot have a min or a max since f (0, 0) = 0, but you also
have points where f is positive or where f is negative arbitrarily near the origin.
A computation shows that

0 0
Hf = , so det(H f ) = 0.
0 −2
So we wouldn’t be any smarter from looking at the Hessian.
The moral of the story is that if the Hessian has zero eigenvalues, you might have
to get tricky to figure out what kind of critical point you have. There is no simple
nth derivative test as there is for one variable functions.

6
More variables
Let’s do this in 3 variables, with n variables being exactly the same idea. If f (x, y, z)
is a function of three variables, then the Hessian is the matrix
 ∂2 f ∂2 f ∂2 f 
 ∂ x2 ∂ x∂ y ∂ x∂z 
H f =  ∂∂y∂fx ∂2 f ∂2 f 
 2 
∂ y2 ∂ y∂z 
 ∂2 f ∂2 f ∂ 2 f 
 ∂z∂ x ∂z∂ y ∂z2 


Again, a critical point is a minimum if H f has 3 positive eiganvalues, it is a

maximum if it has 3 negative eigenvalues and it is a saddle point if it has at least
one positive and one negative eigenvalue.

Example: Let
f (x, y, z) = 3x 2 + xz + 2zy − z2 .
The Hessain matrix is
6 0 1 
H f = 0 0 2  det(H f ) = −24 < 0.
 
so
1 2 −2
 
The determinant (the product of eigenvalues) is negative, so at least one eigenvalue
is negative. So this cannot possibly be a minimum. But we cannot yet decide
if it is a max or a saddle. However, we know we will be able to tell since it
is nondegenerate. We just have to try harder. Either running through the linear
algebra that you know or plugging into a computer or a calculator, you find that the
eigenvalues are approximately −3.31, 6.13, 1.18. So one negative and two positive.
The critical point must be a saddle.

Another example: Let

f (x, y, z) = −9x 2 + 6xy − 2y 2 − 2xz − 2z2
The Hessian matrix is
−18 6 0 
H f =  6 −4 −2 det(H f ) = −72 < 0.
 
so
 0 −2 −4
 

7
Again, negative, so at least one (or possibly all three) eigenvalues are negative. In
this case one finds that the eigenvalues are approximately −20.25, −0.70, −5.05.
So all three are negative.
There are other ways of figuring this out. For example, if the top left entry is
negative, the first 2 × 2 principal submatrix has positive determinant, and the full
matrix has negative determinant, then the eigenvalues are all negative. In this case
−18 6 0 
−18 6
−18 < 0, = 36 > 0, det  6 −4 −2 ® = −72 < 0,
©  ª
det
6 −4
«  0 −2 −4 ¬
 

Basically you have to do some linear algebra to figure out the signs of the eigen-
values.
One common way to do this for 3 by 3 (and this generalizes to n by n) is the one
we just used above. Given a symmetric matrix
a b c 
b d e  ,
 
 
c e f 
 
it has three positive eigenvalues if
a b c 
a b
a > 0, > 0, det  b d e  ® > 0.
©  ª
det and
b c
« c e f  ¬
 

And it has three negative eigenvalues if

Computing determinants is a lot easier and a lot faster than computing the actual
eigenvalues.

Course Notes For MATH 524: Non-Linear Optimization
No ratings yet
Course Notes For MATH 524: Non-Linear Optimization
112 pages
Session 2 Constrained and Unconstrained Optimization
No ratings yet
Session 2 Constrained and Unconstrained Optimization
36 pages
General Second Derivative Test
100% (1)
General Second Derivative Test
4 pages
Proof of The Second-Derivative Test
No ratings yet
Proof of The Second-Derivative Test
4 pages
Session 2 Slides Revised
No ratings yet
Session 2 Slides Revised
43 pages
Cramers Rule 3 by 3 Notes PDF
No ratings yet
Cramers Rule 3 by 3 Notes PDF
4 pages
Hessian and Function Summary
No ratings yet
Hessian and Function Summary
15 pages
Lecture 3-Hessian Matrix and Conditions For Max and Min
No ratings yet
Lecture 3-Hessian Matrix and Conditions For Max and Min
13 pages
Lec 11.3
No ratings yet
Lec 11.3
22 pages
Mclas Tema1 v2
No ratings yet
Mclas Tema1 v2
74 pages
ME554 Sheet 3 Final PDF
No ratings yet
ME554 Sheet 3 Final PDF
31 pages
I Problemas5 Sol
No ratings yet
I Problemas5 Sol
13 pages
Nptel CN Maths
No ratings yet
Nptel CN Maths
32 pages
EC400 Slides Lecture 2
No ratings yet
EC400 Slides Lecture 2
35 pages
Week 11
No ratings yet
Week 11
24 pages
Second Derivatives
No ratings yet
Second Derivatives
6 pages
5 - week - 2회 11.7 Maximum and Minimum Values - 수업자료
No ratings yet
5 - week - 2회 11.7 Maximum and Minimum Values - 수업자료
44 pages
Session 17 2024
No ratings yet
Session 17 2024
20 pages
Gradients Derivatives
No ratings yet
Gradients Derivatives
23 pages
Chapter 003
No ratings yet
Chapter 003
54 pages
Maxima Minima For Several Variables
No ratings yet
Maxima Minima For Several Variables
20 pages
Math Ass3
No ratings yet
Math Ass3
7 pages
04 23ECE216 OptimizationWithHessians
No ratings yet
04 23ECE216 OptimizationWithHessians
24 pages
Second Order Partial Derivatives The Hessian Matrix Minima and Maxima
No ratings yet
Second Order Partial Derivatives The Hessian Matrix Minima and Maxima
12 pages
Max-Min Problems in and The Hessian Matrix: Taylor's Theorem in R
No ratings yet
Max-Min Problems in and The Hessian Matrix: Taylor's Theorem in R
8 pages
Calculus: 2 Midterm
No ratings yet
Calculus: 2 Midterm
6 pages
ps6 Sol
No ratings yet
ps6 Sol
5 pages
2-FENG 346 Unconstrained and Nonlinear Optimization Problems - B
No ratings yet
2-FENG 346 Unconstrained and Nonlinear Optimization Problems - B
37 pages
MA1521Chap5 Optimization
No ratings yet
MA1521Chap5 Optimization
17 pages
Positive and Negative Definite Matrices and Optimization
No ratings yet
Positive and Negative Definite Matrices and Optimization
7 pages
Lecture 11
No ratings yet
Lecture 11
10 pages
Chapter 2 - Unconstrained Optimization
No ratings yet
Chapter 2 - Unconstrained Optimization
20 pages
Lect 4 Unconstraint Optimization
No ratings yet
Lect 4 Unconstraint Optimization
16 pages
Multi-Variate Calculus Applications
No ratings yet
Multi-Variate Calculus Applications
10 pages
Connexions Module: m11240
100% (2)
Connexions Module: m11240
4 pages
CBSE Grade XII Maths Supplementary Material 2017-18
100% (1)
CBSE Grade XII Maths Supplementary Material 2017-18
203 pages
MAE Opti Worksheet 3 Correction
No ratings yet
MAE Opti Worksheet 3 Correction
6 pages
Linear Algebra (Mir, 1983)
No ratings yet
Linear Algebra (Mir, 1983)
393 pages
L38 TwoVariableExtremaI Handouts
No ratings yet
L38 TwoVariableExtremaI Handouts
13 pages
Chap2 Lec1 Coercive Functions and Global Minimizers
No ratings yet
Chap2 Lec1 Coercive Functions and Global Minimizers
5 pages
7H Positive Definite
No ratings yet
7H Positive Definite
19 pages
ECON 500 - Problem Set 0
No ratings yet
ECON 500 - Problem Set 0
3 pages
Hessian and Critical Points - Assignment 1
No ratings yet
Hessian and Critical Points - Assignment 1
6 pages
Analysis Real Solutions 9
No ratings yet
Analysis Real Solutions 9
7 pages
Quadratic Forms and Local Extrema: JLB MAT3216: Handout 2
No ratings yet
Quadratic Forms and Local Extrema: JLB MAT3216: Handout 2
2 pages
Lecture 5 - Differentiation
No ratings yet
Lecture 5 - Differentiation
29 pages
Maxima and Minima: Functions of Two Variables: - Typeset by Foiltex - 1
No ratings yet
Maxima and Minima: Functions of Two Variables: - Typeset by Foiltex - 1
6 pages
Hessian Matrix
No ratings yet
Hessian Matrix
4 pages
Herrero - Free Extrema of Two Variables Functions
No ratings yet
Herrero - Free Extrema of Two Variables Functions
8 pages
HsMatrices and Transformation - Mathematics Form 4 Notes-1339
No ratings yet
HsMatrices and Transformation - Mathematics Form 4 Notes-1339
9 pages
Extrems Value
No ratings yet
Extrems Value
7 pages
Tutorial 1 Fundamentals: Review of Derivatives, Gradients and Hessians
No ratings yet
Tutorial 1 Fundamentals: Review of Derivatives, Gradients and Hessians
25 pages
Math Formulas
No ratings yet
Math Formulas
2 pages
Sharipov-A Quick Introduction To Tensor Analysis
100% (1)
Sharipov-A Quick Introduction To Tensor Analysis
47 pages
Proof of The Second-Derivative Test
No ratings yet
Proof of The Second-Derivative Test
4 pages
34 Math 22B Notes 2012 Intermediate Calculus
No ratings yet
34 Math 22B Notes 2012 Intermediate Calculus
4 pages
Chapter 4. Extrema and Double Integrals: Section 4.1: Extrema, Second Derivative Test
No ratings yet
Chapter 4. Extrema and Double Integrals: Section 4.1: Extrema, Second Derivative Test
6 pages
Lec 17 Multivariable OT
No ratings yet
Lec 17 Multivariable OT
30 pages
Functions of Several Variables
No ratings yet
Functions of Several Variables
4 pages
Classical Optimization Technique
No ratings yet
Classical Optimization Technique
19 pages
AI Learning Plan, 2 Hours - Day
No ratings yet
AI Learning Plan, 2 Hours - Day
37 pages
2nd Derivative
No ratings yet
2nd Derivative
5 pages
Lecture When Is A Function Convex Hessian Positive Definite
No ratings yet
Lecture When Is A Function Convex Hessian Positive Definite
7 pages
Mathematics Syllabus 2024
No ratings yet
Mathematics Syllabus 2024
5 pages
C24 - Curriculum: Diploma in Electrical and Electronics Engineering
No ratings yet
C24 - Curriculum: Diploma in Electrical and Electronics Engineering
104 pages
Business Mathematics
No ratings yet
Business Mathematics
346 pages
Cambridge Part IB Linear Algebra Alex Chan
No ratings yet
Cambridge Part IB Linear Algebra Alex Chan
82 pages
JKSSB Accounts Assistant Official Paper (Held On - 28 Jan, 2024)
No ratings yet
JKSSB Accounts Assistant Official Paper (Held On - 28 Jan, 2024)
28 pages
Determinant - DPP
No ratings yet
Determinant - DPP
7 pages
Final MED - All - Syllabus - 180s - With Co Po PDF
No ratings yet
Final MED - All - Syllabus - 180s - With Co Po PDF
189 pages
Matrices 0 PDF
No ratings yet
Matrices 0 PDF
14 pages
Equation Matrix
No ratings yet
Equation Matrix
24 pages
Ty.b.sc - Mathematicssyllabus Course Outcomes
No ratings yet
Ty.b.sc - Mathematicssyllabus Course Outcomes
37 pages
Determinants and Matrices
No ratings yet
Determinants and Matrices
13 pages
Course Content of The Department of Mathematics Old Syllabi
No ratings yet
Course Content of The Department of Mathematics Old Syllabi
67 pages
SSC CGL Syllabus PDF
No ratings yet
SSC CGL Syllabus PDF
8 pages
4.6 Solving Systems With Calculators
No ratings yet
4.6 Solving Systems With Calculators
6 pages
Linear Algebra With Applications
No ratings yet
Linear Algebra With Applications
70 pages
Vector Spaces
No ratings yet
Vector Spaces
20 pages
Yash Agarwal Class 12 F
No ratings yet
Yash Agarwal Class 12 F
13 pages
KNF2033 Engineering Mathematics III: Msmfaizrizwan@feng - Unimas.my
No ratings yet
KNF2033 Engineering Mathematics III: Msmfaizrizwan@feng - Unimas.my
26 pages
Swashes PDF
No ratings yet
Swashes PDF
82 pages
An Efficient Sorting Algorithm With CUDA
No ratings yet
An Efficient Sorting Algorithm With CUDA
8 pages
Diabetología: VI: Nicolae C. Paulescu: Light and Darkness
No ratings yet
Diabetología: VI: Nicolae C. Paulescu: Light and Darkness
9 pages
Chapter 5 - Determinants
No ratings yet
Chapter 5 - Determinants
34 pages
16 RA - (Determinant - Matrix)
No ratings yet
16 RA - (Determinant - Matrix)
4 pages
Appears in Computer Algebra, Second Edition, B. Buchberger, R. Loos, G. Collins, Editors, Springer Verlag, Vienna, Austria, Pp. 95-11 (1982)
No ratings yet
Appears in Computer Algebra, Second Edition, B. Buchberger, R. Loos, G. Collins, Editors, Springer Verlag, Vienna, Austria, Pp. 95-11 (1982)
21 pages
641501engineering Math-1 Lesson Plan
No ratings yet
641501engineering Math-1 Lesson Plan
5 pages
Calculus On Computational Graphs - An Introduction: February 2021
No ratings yet
Calculus On Computational Graphs - An Introduction: February 2021
13 pages
Preparation Instructions For CT Pneumocolon - Afternoon Appointment
No ratings yet
Preparation Instructions For CT Pneumocolon - Afternoon Appointment
5 pages
Outline Math221 2023 24
No ratings yet
Outline Math221 2023 24
2 pages
X + 1 Problem To A Proof For The Nonexistence of 2-Cycles. A
No ratings yet
X + 1 Problem To A Proof For The Nonexistence of 2-Cycles. A
8 pages
Star English Boarding School Bagtar, Colony First Terminal Examination 2076
No ratings yet
Star English Boarding School Bagtar, Colony First Terminal Examination 2076
3 pages
Elementary Proof For Sion'S Minimax Theorem: H. Komiya Kodat Math. J. 11 (1988), 5-7
No ratings yet
Elementary Proof For Sion'S Minimax Theorem: H. Komiya Kodat Math. J. 11 (1988), 5-7
3 pages

The Hessian and Optimization

Uploaded by

The Hessian and Optimization

Uploaded by

The Hessian and optimization

Let us start with two dimensions:

Here the H f is the so called Hessian matrix

Two negative eigenvalues:

Therefore, we if we wish to figure out if a critical point is a minimum or a maximum,

f γ(t) = f t cos(θ), t sin(θ) .

For any θ, f γ(t) has a strict maximum at t = 0. By a strict maximum we mean

which clearly has a strict maximum.

Again, a critical point is a minimum if H f has 3 positive eiganvalues, it is a

Another example: Let

And it has three negative eigenvalues if

You might also like