0% found this document useful (0 votes)
62 views8 pages

The Hessian and Optimization

The document discusses the Hessian matrix and how it relates to the second-order Taylor expansion of a function. The Hessian describes how a function behaves locally as either elliptic, hyperbolic or parabolic based on the eigenvalues of the matrix.

Uploaded by

87zero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views8 pages

The Hessian and Optimization

The document discusses the Hessian matrix and how it relates to the second-order Taylor expansion of a function. The Hessian describes how a function behaves locally as either elliptic, hyperbolic or parabolic based on the eigenvalues of the matrix.

Uploaded by

87zero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

The Hessian and optimization

Let us start with two dimensions:


Let f (x, y) be a function of two variables. We write the Taylor expansion around
(x0, y0 ). Write the vector h® = hx − x0, y − y0 i. We will consider h® as a column
vector when matrices are involved.
∂ f ∂ f

f (x, y) = f (x0, y0 ) + (x − x0 ) + (y − y0 ) +
∂ x (x0,y0 ) ∂ y (x0,y0 )
| {z }

∇f · h®
(x0,y0 )

1 ∂ 2 f ∂ 2f ∂ 2f

1
+ (x − x0 )2 + (x − x0 )(y − y0 ) + (y − y0 )2 + · · ·
2 ∂ x (x0,y0 )
2 ∂ x∂ y (x0,y0 )
2 ∂ y (x0,y0 )
2
| {z }

1®t
h Hf h®
2 (x0,y0 )

Here the H f is the so called Hessian matrix


" ∂2 f ∂2 f
#
∂ x2 ∂ x∂ y
Hf = ∂2 f ∂2 f .
∂ y∂ x ∂ y2

And as h® is a column vector, recall that h® t is the transpose of h, ® that is, the
corresponding row vector, and so the computation becomes
" ∂2 f ∂2 f #  
∂ (x − x )
h® H f h® = (x − x0 ) (y − y0 ) ∂∂2 xf ∂ 2 f 0
2
t x∂ y
.
 
(y − y 0 )
2 ∂ x∂ y ∂ y

The trick is to understand what do these second derivatives tell us about the
function. The best place to figure out what they mean is at a critical point, that is
a point where ∇ f = 0. For simplicity suppose that x0 = y0 = 0, and f (x0, y0 ) = 0,
and of course ∇ f |(0,0) = 0. Then we have that
1
f (x, y) = h® t H f (x0,y0 ) h® + · · · ,

2
and so f behaves like a quadratic form near by the origin. Suppose that H f is
nondegenerate, that is, suppose that it has no zero eigenvalues. In that case it really
behaves like one of three cases (after a linear change of coordinates).

1
Two positive eigenvalues:
The model case is when the matrix looks like
 
1 1 0
Hf =
2 0 1
(we write half the matrix for simplicity). The function then becomes x 2 + y 2 , and
then the graph of the form looks like
10 -10 x
-5
y 5 0
5
0
10
-5
200
-10

200 150

150 100

100 50

50 0

10
0
5
-10 0
-5
0 -5 y
5
x 10 -10

Two negative eigenvalues:


The model case is when the matrix looks like
 
1 −1 0
Hf = .
2 0 −1
The function then becomes −x 2 − y 2 , and then the graph of the form looks like
10 -10 x
-5
y 5 0
5
0
10
-5
0
-10

0 -50

-50 -100

-100 -150

-150 -200

10
-200
5
-10 0
-5
0 -5 y
5
x 10 -10

2
One positive and one negative eigenvalue:
The model case is when the matrix looks like
 
1 1 0
Hf = .
2 0 −1
The function then becomes x 2 − y 2 , and then the graph of the form looks like
10 -10 x
-5
y 5 0
5
0
10
-5
100
-10

100 50

50 0

0 -50

-50 -100

10
-100
5
-10 0
-5
0 -5 y
5
x 10 -10

Therefore, we if we wish to figure out if a critical point is a minimum or a maximum,


(or a saddle), we look at the Hessian. If it is nondegenerate, we figure out what
sign are the eigenvalues.
Computing eigenvalues is difficult, but there is a simple way to tell the signs.
Notice
     
1 0 −1 0 1 0
det = 1, det = 1, det = −1.
0 1 0 −1 0 −1
The determinant of a matrix is a product of the eigenvalues. Therefore, if eigen-
values are of opposite signs, determinant of a 2 × 2 matrix is negative. Be careful,
the reasoning is different in 3 × 3, or larger, matrices.
It turns out that if you have two eigenvalues of the same sign, you can decide on
the sign of the eigenvalues of a 2 × 2 matrix by just checking the upper left (or
lower right) number in the matrix. If it is positive, then the matrix had two positive
eigenvalues. One way you can see that is that you notice that if you above set
y = 0, then f (x, 0) is a function of x alone, and its second derivative is the upper

3
left number of the matrix. If it is positive, then the graph of the function must
“curve up”, that is, it must be a minimum. Therefore, there must be at least one
positive eigenvalue.
Let’s see an example. Take
f (x, y) = 3x 2 + 7xy + 5y 2 .
The origin is a critical point, we just need to decide, is it a min/max or a saddle.
The Hessian is
   
6 7 6 7
Hf = , and so det(H f ) = det = 6 × 10 − 7 × 7 = 11.
7 10 7 10
So the critical point at the origin is either a local minimum or a maximum. Since
the upper left component is 6. It must be that the surface curves upwards, both
eigenvalues are positive and the point is a minimum for f . In fact the eigenvalues
are approximately 0.72 and 15.28, but notice that we didn’t have to check that.
Now suppose
g(x, y) = 3x 2 + 8x y + 5y 2 .
Again the origin is a critical point. The Hessian is
   
6 8 6 8
Hg = , so det(Hg ) = det = 6 × 10 − 8 × 8 = −4.
8 10 8 10
So the eigenvalues must be of opposite signs, and the surface must be a saddle
point. In fact they are approximately −0.25 and 16.25. Neither min nor max.

Degenerate Hessian
Consider
f (x, y) = x 2 + y 4 .
It is not difficult to convince yourself that this function is always positive except at
the origin, when it is zero. So the origin is a minimum.
However if you compute the Hessian you find
 
2 0
Hf = , so det(H f ) = 0.
0 0

4
The matrix is degenerate (it must have a zero eigenvalue since the determinant is
zero). So we can’t quite tell from the second derivative.
To see why we can’t tell, try
g(x, y) = x 2 − y 4 .
The graph of this is a saddle (Just notice that g(x, 0) = x 2 and g(0, y) = −y 4 ). If
you compute the Hessian you find
 
2 0
Hg = , so det(Hg ) = 0.
0 0
The Hessian is the same as for f above, so clearly we cannot read much from
Hg and H f . One thing that we can read off, is that the critical point is definitely
not a maximum. That would require two negative eigenvalues, but one of the
eigenvalues is in fact 2, which is positive (the other is the zero).

Peano’s example
One has to be careful with trying to just check for extrema (min or max) on lines.
Consider
f (x, y) = (y − x 2 )(2x 2 − y).
This function, is always zero on the sets where y = x 2 and y = 2x 2 . In fact, it is
negative outside the space between these two lines and positive between the two
lines.
-10 -5 0 5 10

200 200
x^2
2*x^2

150 150

100 100

50 50

0 0

-10 -5 0 5 10

If we restrict the function to any line, for example if we pick any angle θ, and
consider the line through the origin at the angle θ to the positive real axis. That is,

5
the line given by γ(t) = t cos(θ), t sin(θ) , then


f γ(t) = f t cos(θ), t sin(θ) .


 

For any θ, f γ(t) has a strict maximum at t = 0. By a strict maximum we mean




that the function is actually strictly less than zero for t near zero. In fact except for
when θ = 0 or θ = π, we find
d2 h i
f γ(t) < 0
dt 2
so it is a maximum (a strict one). When θ = 0 , this corresponds to y = 0, and so
f γ(t) = −2t 4,


which clearly has a strict maximum.


But f is a saddle! It is just a “bent saddle.”
0.50-0.50 x
-0.25
y 0.25 0.00
0.25
0.00
0.50
-0.25
0.0
-0.50

0.0 -0.2

-0.2 -0.4

-0.4 -0.6

-0.6
0.50

0.25
-0.50 0.00
-0.25
0.00 -0.25 y
0.25
x 0.50-0.50

Clearly the function cannot have a min or a max since f (0, 0) = 0, but you also
have points where f is positive or where f is negative arbitrarily near the origin.
A computation shows that
 
0 0
Hf = , so det(H f ) = 0.
0 −2
So we wouldn’t be any smarter from looking at the Hessian.
The moral of the story is that if the Hessian has zero eigenvalues, you might have
to get tricky to figure out what kind of critical point you have. There is no simple
nth derivative test as there is for one variable functions.

6
More variables
Let’s do this in 3 variables, with n variables being exactly the same idea. If f (x, y, z)
is a function of three variables, then the Hessian is the matrix
 ∂2 f ∂2 f ∂2 f 
 ∂ x2 ∂ x∂ y ∂ x∂z 
H f =  ∂∂y∂fx ∂2 f ∂2 f 
 2 
∂ y2 ∂ y∂z 
 ∂2 f ∂2 f ∂ 2 f 
 ∂z∂ x ∂z∂ y ∂z2 

Again, a critical point is a minimum if H f has 3 positive eiganvalues, it is a


maximum if it has 3 negative eigenvalues and it is a saddle point if it has at least
one positive and one negative eigenvalue.

Example: Let
f (x, y, z) = 3x 2 + xz + 2zy − z2 .
The Hessain matrix is
6 0 1 
H f = 0 0 2  det(H f ) = −24 < 0.
 
so
1 2 −2
 
The determinant (the product of eigenvalues) is negative, so at least one eigenvalue
is negative. So this cannot possibly be a minimum. But we cannot yet decide
if it is a max or a saddle. However, we know we will be able to tell since it
is nondegenerate. We just have to try harder. Either running through the linear
algebra that you know or plugging into a computer or a calculator, you find that the
eigenvalues are approximately −3.31, 6.13, 1.18. So one negative and two positive.
The critical point must be a saddle.

Another example: Let


f (x, y, z) = −9x 2 + 6xy − 2y 2 − 2xz − 2z2
The Hessian matrix is
−18 6 0 
H f =  6 −4 −2 det(H f ) = −72 < 0.
 
so
 0 −2 −4
 

7
Again, negative, so at least one (or possibly all three) eigenvalues are negative. In
this case one finds that the eigenvalues are approximately −20.25, −0.70, −5.05.
So all three are negative.
There are other ways of figuring this out. For example, if the top left entry is
negative, the first 2 × 2 principal submatrix has positive determinant, and the full
matrix has negative determinant, then the eigenvalues are all negative. In this case
  −18 6 0 
−18 6
−18 < 0, = 36 > 0, det ­  6 −4 −2 ® = −72 < 0,
©  ª
det
6 −4
«  0 −2 −4 ¬
 

Basically you have to do some linear algebra to figure out the signs of the eigen-
values.
One common way to do this for 3 by 3 (and this generalizes to n by n) is the one
we just used above. Given a symmetric matrix
a b c 
b d e  ,
 
 
c e f 
 
it has three positive eigenvalues if
  a b c 
a b
a > 0, > 0, det ­  b d e  ® > 0.
©  ª
det and
b c
« c e f  ¬
 

And it has three negative eigenvalues if


  a b c 
a b
a < 0, > 0, det ­  b d e  ® < 0.
©  ª
det and
b c
« c e f  ¬
 

Computing determinants is a lot easier and a lot faster than computing the actual
eigenvalues.

You might also like