The Hessian and Optimization
The Hessian and Optimization
1 ∂ 2 f ∂ 2f ∂ 2f
1
+ (x − x0 )2 + (x − x0 )(y − y0 ) + (y − y0 )2 + · · ·
2 ∂ x (x0,y0 )
2 ∂ x∂ y (x0,y0 )
2 ∂ y (x0,y0 )
2
| {z }
1®t
h Hf h®
2 (x0,y0 )
And as h® is a column vector, recall that h® t is the transpose of h, ® that is, the
corresponding row vector, and so the computation becomes
" ∂2 f ∂2 f #
∂ (x − x )
h® H f h® = (x − x0 ) (y − y0 ) ∂∂2 xf ∂ 2 f 0
2
t x∂ y
.
(y − y 0 )
2 ∂ x∂ y ∂ y
The trick is to understand what do these second derivatives tell us about the
function. The best place to figure out what they mean is at a critical point, that is
a point where ∇ f = 0. For simplicity suppose that x0 = y0 = 0, and f (x0, y0 ) = 0,
and of course ∇ f |(0,0) = 0. Then we have that
1
f (x, y) = h® t H f (x0,y0 ) h® + · · · ,
2
and so f behaves like a quadratic form near by the origin. Suppose that H f is
nondegenerate, that is, suppose that it has no zero eigenvalues. In that case it really
behaves like one of three cases (after a linear change of coordinates).
1
Two positive eigenvalues:
The model case is when the matrix looks like
1 1 0
Hf =
2 0 1
(we write half the matrix for simplicity). The function then becomes x 2 + y 2 , and
then the graph of the form looks like
10 -10 x
-5
y 5 0
5
0
10
-5
200
-10
200 150
150 100
100 50
50 0
10
0
5
-10 0
-5
0 -5 y
5
x 10 -10
0 -50
-50 -100
-100 -150
-150 -200
10
-200
5
-10 0
-5
0 -5 y
5
x 10 -10
2
One positive and one negative eigenvalue:
The model case is when the matrix looks like
1 1 0
Hf = .
2 0 −1
The function then becomes x 2 − y 2 , and then the graph of the form looks like
10 -10 x
-5
y 5 0
5
0
10
-5
100
-10
100 50
50 0
0 -50
-50 -100
10
-100
5
-10 0
-5
0 -5 y
5
x 10 -10
3
left number of the matrix. If it is positive, then the graph of the function must
“curve up”, that is, it must be a minimum. Therefore, there must be at least one
positive eigenvalue.
Let’s see an example. Take
f (x, y) = 3x 2 + 7xy + 5y 2 .
The origin is a critical point, we just need to decide, is it a min/max or a saddle.
The Hessian is
6 7 6 7
Hf = , and so det(H f ) = det = 6 × 10 − 7 × 7 = 11.
7 10 7 10
So the critical point at the origin is either a local minimum or a maximum. Since
the upper left component is 6. It must be that the surface curves upwards, both
eigenvalues are positive and the point is a minimum for f . In fact the eigenvalues
are approximately 0.72 and 15.28, but notice that we didn’t have to check that.
Now suppose
g(x, y) = 3x 2 + 8x y + 5y 2 .
Again the origin is a critical point. The Hessian is
6 8 6 8
Hg = , so det(Hg ) = det = 6 × 10 − 8 × 8 = −4.
8 10 8 10
So the eigenvalues must be of opposite signs, and the surface must be a saddle
point. In fact they are approximately −0.25 and 16.25. Neither min nor max.
Degenerate Hessian
Consider
f (x, y) = x 2 + y 4 .
It is not difficult to convince yourself that this function is always positive except at
the origin, when it is zero. So the origin is a minimum.
However if you compute the Hessian you find
2 0
Hf = , so det(H f ) = 0.
0 0
4
The matrix is degenerate (it must have a zero eigenvalue since the determinant is
zero). So we can’t quite tell from the second derivative.
To see why we can’t tell, try
g(x, y) = x 2 − y 4 .
The graph of this is a saddle (Just notice that g(x, 0) = x 2 and g(0, y) = −y 4 ). If
you compute the Hessian you find
2 0
Hg = , so det(Hg ) = 0.
0 0
The Hessian is the same as for f above, so clearly we cannot read much from
Hg and H f . One thing that we can read off, is that the critical point is definitely
not a maximum. That would require two negative eigenvalues, but one of the
eigenvalues is in fact 2, which is positive (the other is the zero).
Peano’s example
One has to be careful with trying to just check for extrema (min or max) on lines.
Consider
f (x, y) = (y − x 2 )(2x 2 − y).
This function, is always zero on the sets where y = x 2 and y = 2x 2 . In fact, it is
negative outside the space between these two lines and positive between the two
lines.
-10 -5 0 5 10
200 200
x^2
2*x^2
150 150
100 100
50 50
0 0
-10 -5 0 5 10
If we restrict the function to any line, for example if we pick any angle θ, and
consider the line through the origin at the angle θ to the positive real axis. That is,
5
the line given by γ(t) = t cos(θ), t sin(θ) , then
that the function is actually strictly less than zero for t near zero. In fact except for
when θ = 0 or θ = π, we find
d2 h i
f γ(t) < 0
dt 2
so it is a maximum (a strict one). When θ = 0 , this corresponds to y = 0, and so
f γ(t) = −2t 4,
0.0 -0.2
-0.2 -0.4
-0.4 -0.6
-0.6
0.50
0.25
-0.50 0.00
-0.25
0.00 -0.25 y
0.25
x 0.50-0.50
Clearly the function cannot have a min or a max since f (0, 0) = 0, but you also
have points where f is positive or where f is negative arbitrarily near the origin.
A computation shows that
0 0
Hf = , so det(H f ) = 0.
0 −2
So we wouldn’t be any smarter from looking at the Hessian.
The moral of the story is that if the Hessian has zero eigenvalues, you might have
to get tricky to figure out what kind of critical point you have. There is no simple
nth derivative test as there is for one variable functions.
6
More variables
Let’s do this in 3 variables, with n variables being exactly the same idea. If f (x, y, z)
is a function of three variables, then the Hessian is the matrix
∂2 f ∂2 f ∂2 f
∂ x2 ∂ x∂ y ∂ x∂z
H f = ∂∂y∂fx ∂2 f ∂2 f
2
∂ y2 ∂ y∂z
∂2 f ∂2 f ∂ 2 f
∂z∂ x ∂z∂ y ∂z2
Example: Let
f (x, y, z) = 3x 2 + xz + 2zy − z2 .
The Hessain matrix is
6 0 1
H f = 0 0 2 det(H f ) = −24 < 0.
so
1 2 −2
The determinant (the product of eigenvalues) is negative, so at least one eigenvalue
is negative. So this cannot possibly be a minimum. But we cannot yet decide
if it is a max or a saddle. However, we know we will be able to tell since it
is nondegenerate. We just have to try harder. Either running through the linear
algebra that you know or plugging into a computer or a calculator, you find that the
eigenvalues are approximately −3.31, 6.13, 1.18. So one negative and two positive.
The critical point must be a saddle.
7
Again, negative, so at least one (or possibly all three) eigenvalues are negative. In
this case one finds that the eigenvalues are approximately −20.25, −0.70, −5.05.
So all three are negative.
There are other ways of figuring this out. For example, if the top left entry is
negative, the first 2 × 2 principal submatrix has positive determinant, and the full
matrix has negative determinant, then the eigenvalues are all negative. In this case
−18 6 0
−18 6
−18 < 0, = 36 > 0, det 6 −4 −2 ® = −72 < 0,
© ª
det
6 −4
« 0 −2 −4 ¬
Basically you have to do some linear algebra to figure out the signs of the eigen-
values.
One common way to do this for 3 by 3 (and this generalizes to n by n) is the one
we just used above. Given a symmetric matrix
a b c
b d e ,
c e f
it has three positive eigenvalues if
a b c
a b
a > 0, > 0, det b d e ® > 0.
© ª
det and
b c
« c e f ¬
Computing determinants is a lot easier and a lot faster than computing the actual
eigenvalues.