0% found this document useful (0 votes)
25 views8 pages

Max-Min Problems in and The Hessian Matrix: Taylor's Theorem in R

Uploaded by

Clash Swd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views8 pages

Max-Min Problems in and The Hessian Matrix: Taylor's Theorem in R

Uploaded by

Clash Swd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1

Max-Min Problems in Rn and the Hessian


Matrix
Prerequisite: Section 6.3, Orthogonal Diagonalization
In this section, we study the problem of …nding local maxima and minima for real-
valued functions on Rn . The method we describe is the higher-dimensional analogue
to …nding critical points and applying the second derivative test to functions on R
studied in …rst-semester calculus.

I Taylor’s Theorem in Rn
Let f 2 C 2 (Rn ), where C 2 (Rn ) is the set of real-valued functions de…ned on Rn
having continuous second partial derivatives. The method for solving for local
extreme points of f relies upon Taylor’s Theorem with second degree remainder
terms, which we state here without proof. (In the following theorem, an open
hypersphere centered at x0 is a set of the form fx 2 Rn j kx x0 k < rg for some
positive real number r.)

THEOREM 1
(Taylor’s Theorem in Rn ) Let A be an open hypersphere centered at x0 2 Rn ,
let u be a unit vector in Rn , and let t 2 R such that x0 + tu 2 A. Suppose
f : A ! R has continuous second partial derivatives throughout A; that is,
f 2 C 2 (A). Then there is a c with 0 c t such that

Xn n
@f 1 X @2f
f (x0 + tu) = f (x0 ) + (tui ) + (t2 u2i )
i=1
@xi x0 2 i=1 @x2i x0 +cu
n
X n
X 2
@ f
+ (t2 ui uj ):
i=1 j=i+1
@xi @xj x0 +cu

Taylor’s Theorem in Rn is derived from the familiar Taylor’s Theorem in R


by applying it to the function g(t) = f (x0 + tu). In R2 , the formula in Taylor’s
Theorem is
@f @f
f (x0 + tu) = f (x0 ) + (tu1 ) + (tu2 )
@x x0 @y x0
1 @2f 1 @ f2
+ (t2 u21 ) + (t2 u22 )
2 @x2 x0 +cu 2 @y 2 x0 +cu
@2f
+ (t2 u1 u2 ):
@x@y x0 +cu
h i
@f @f @f
Recall that the gradient of f is de…ned by rf = @x1 ; @x2 ; : : : ; @xn . If we
@f
let v = tu, then, in R2 , v = [v1 ; v2 ] = [tu1 ; tu2 ], and so the sum @x (tu1 ) +
x0
@f
@y (tu2 ) simpli…es to rf v. Also, since f has continuous second partial
x0 x0

Andrilli/Hecker— Elementary Linear Algebra, 4th ed.— March 15, 2010

Copyright © 2010, Elsevier Inc. All rights reserved.


2
@2f @2f
derivatives, we have @x@y = @y@x : Therefore,

1 @2f 2 2 1 @2f 2 2 @2f 2


2
(t u1 ) + 2
(t u2 ) + (t u1 u2 )
2 @x 2 @y @x@y
1 @2f @2f 1 @2f @2f
= v1 v 1 + v2 + v2 v1 + v2
2 @x2 @x@y 2 @y@x @y 2
2 @2f @2f 3
1 T 4 @x2 @x@y 5
= v v;
2 @2f @2f
@y@x @y 2

where v is considered to be a column vector. The matrix


2 @2f @2f 3
@x2 @x@y
H=4 2
5
@ f @2f
@y@x @y 2

in this expression is called the Hessian matrix for f . Thus, in the R2 case, with
v = tu, the formula in Taylor’s Theorem can be written as
1 T
f (x0 + v) = f (x0 ) + rf v + v H v,
x0 2 x0 +kv

for some k with 0 k 1 (where k = ct ). While we have derived this result in R2 ,


the same formula holds in Rn , where the Hessian H is the matrix whose (i; j) entry
2
is @x@i @x
f
j
.

I Critical Points
If A is a subset of Rn , then we say that f : A ! R has a local maximum at a point
x0 2 A if and only if there is an open neighborhood U of x0 such that f (x0 ) f (x)
for all x 2 U. A local minimum for a function f is de…ned analogously.

THEOREM 2
Let A be an open hypersphere centered at x0 2 Rn , and let f : A ! R have
continuous …rst partial derivatives on A. If f has a local maximum or a local
minimum at x0 , then rf (x0 ) = 0.

Proof If x0 is a local maximum, then f (x0 + hei ) f (x0 ) 0 for small h. Then,
f (x0 +hei ) f (x0 )
limh!0+ h 0. Similarly, limh!0 f (x0 +hehi ) f (x0 ) 0. Hence, for
@f
the limit to exist, we must have @xi = 0. Since this is true for each i, rf = 0.
x0 x0
A similar proof works for local minimums. Q ED

Points x0 at which rf (x0 ) = 0 are called critical points.

Example 1 Let f : R2 ! R be given by


f (x; y) = 7x2 + 6xy + 2x + 7y 2 22y + 23.
Then rf = [14x + 6y + 2; 6x + 14y 22]. We …nd critical points for f by solving
rf = 0. This is the linear system
14x + 6y + 2 = 0
;
6x + 14y 22 = 0
which has the unique solution x0 = [ 1; 2]. Hence, by Theorem 2, ( 1; 2) is
the only possible extreme point for f . (We will see later that ( 1; 2) is a local
minimum.)

Andrilli/Hecker— Elementary Linear Algebra, 4th ed.— March 15, 2010

Copyright © 2010, Elsevier Inc. All rights reserved.


3
I Su¢ cient Conditions for Local Extreme Points
If x0 is a critical point for a function f , how can we determine whether x0 is a local
maximum or a local minimum? For functions on R, we have the second derivative
test from calculus, which says that if f 00 (x0 ) < 0, then x0 is a local maximum, but
if f 00 (x0 ) > 0, then x0 is a local minimum. We now derive a similar test in Rn .
Consider the following formula from Taylor’s Theorem:
1 T
f (x0 + v) = f (x0 ) + rf (x0 ) v + v H v.
2 x0 +kv

At a critical point, rf (x0 ) = 0, and so

1 T
f (x0 + v) = f (x0 ) + v H v.
2 x0 +kv

Hence, if vT H v is positive for all small nonzero vectors v, then f will


x0 +kv

have a local minimum at x0 . (Similarly, if vT H v is negative, f will


x0 +kv
have a local maximum.) But since we assume that f has continuous second partial
derivatives, vT H v is continuous in v and k, and will be positive for small
x0 +kv

v if vT H v is positive for all nonzero v. Hence,


x0

THEOREM 3
Given the conditions of Taylor’s Theorem for a set A and a function f : A !
R, f has a local minimum at a critical point x0 if vT H v > 0 for all
x0
nonzero vectors v. Similarly, f has a local maximum at a critical point x0 if
vT H v < 0 for all nonzero vectors v.
x0

I Positive De…nite Quadratic Forms


If v is a vector in Rn , and A is an n n matrix, the expression vT Av is known
as a quadratic form. (For more details on the general theory of quadratic forms,
see Section 8.11.) A quadratic form such that vT Av > 0 for all nonzero vectors v
is said to be positive de…nite. Similarly, a quadratic form such that vT Av < 0
for all nonzero vectors v is said to be negative de…nite.
Now, in particular, the expression vT H v in Theorem 3 is a quadratic
x0

form. Theorem 3 then says that if vT H v is a positive de…nite quadratic


x0
form at a critical point x0 , then f has a local minimum at x0 . Theorem 3 also
says that if vT H v is a negative de…nite quadratic form at a critical point
x0
x0 , then f has a local maximum at x0 . Therefore, we need a method to determine
whether a quadratic form of this type is positive de…nite or negative de…nite.
Now, the Hessian matrix H , which we will abbreviate as H, is sym-
x0
2 2
metric because @x@i @x
f
j
= @x@j @x
f
i
(since f 2 C 2 (A)). Hence, by Theorem 6.20, H
can be orthogonally diagonalized. That is, there is an orthogonal matrix P such
that PHPT = D, a diagonal matrix, and so, H = PT DP. Hence, vT Hv =
T
vT PT DPv = (Pv) D(Pv). Letting w = Pv, we get vT Hv = wT Dw. But
P is nonsingular, so as v ranges over all of Rn , so does w, and vice-versa. Thus,

Andrilli/Hecker— Elementary Linear Algebra, 4th ed.— March 15, 2010

Copyright © 2010, Elsevier Inc. All rights reserved.


4

vT Hv > 0 for all nonzero v if and only if wT Dw > 0 for all nonzero w. Now, D
is diagonal, and so wT Dw = d11 w12 + d22 w22 + + dnn wn2 . But the dii ’s are the
T
eigenvalues of H. Thus, it follows that w Dw > 0 for all nonzero w if and only if
all of these eigenvalues are positive. (Set w = ei for each i to prove the “only if”
part of this statement.) Similarly, wT Dw < 0 for all nonzero w if and only if all
of these eigenvalues are negative. Hence,

THEOREM 4
A symmetric matrix A de…nes a positive de…nite quadratic form vT Av if and
only if all of the eigenvalues of A are positive. A symmetric matrix A de…nes a
negative de…nite quadratic form vT Av if and only if all of the eigenvalues of A
are negative.

Hence, Theorem 3 can be restated as follows:

Given the conditions of Taylor’s Theorem for a set A and a function f : A ! R:


(1) if all of the eigenvalues of H are positive at a critical point x0 , then f has a
local minimum at x0 , and
(2) if all of the eigenvalues of H are positive at a critical point x0 , then f has a
local minimum at x0 .

Example 2 Consider the function

f (x; y) = 7x2 + 6xy + 2x + 7y 2 22y + 23:

In Example 1, we found that f has a critical point at x0 = [ 1; 2]. Now, the


Hessian matrix for f at x0 is
2 @2f @2f 3
@x2 @x@y 14 6
H= 4 2 5 = .
@ f @2f 6 14
@y@x @y 2 x0

But pH (x) = x2 28x + 160, which has roots x = 8 and x = 20. Thus, H has all
eigenvalues positive, and hence, vT Hv is positive de…nite. Theorem 4 then tells us
that x0 = [ 1; 2] is a local minimum for f .

I Local Maxima and Minima in R2


It can be shown (see Exercise 3) that a 2 2 symmetric matrix A de…nes a positive
de…nite quadratic form (vT Av > 0 for all nonzero v) if and only if a11 > 0 and
jAj > 0. Similarly, a 2 2 symmetric matrix de…nes a negative de…nite quadratic
form (vT Av < 0 for all nonzero v) if and only if a11 < 0 and jAj > 0.

Example 3 Suppose f (x; y) = 2x2 2x2 y 2 + 2y 2 + 24y x4 y 4 . First, we look for critical
points by solving the system
8
> @f 2
< @x = 4x 4xy 4x3 = 4x(1 (y 2 + x2 )) = 0
:
>
: @f = 4x2 y + 4y + 24 4y 3 = 4y(x2 + y 2 ) + 4y + 24 = 0
@y

Now @f 2 2 @f
@x = 0 yields x = 0 or y +x = 1. If x = 0, then @y = 0 gives 4y+24 4y = 0.
3

The unique real solution to this equation is y = 2. Thus, [0; 2] is a critical point.
If x 6= 0, then y 2 + x2 = 1. From @f@y = 0, we have 0 = 4y(1) + 4y + 24 = 24,
a contradiction, so there is no critical point when x 6= 0.

Andrilli/Hecker— Elementary Linear Algebra, 4th ed.— March 15, 2010

Copyright © 2010, Elsevier Inc. All rights reserved.


5

Next, we compute the Hessian matrix at the critical point [0; 2].
2 @2f @2f 3
@x2 @x@y
H = 4 5
@2f @2f
@y@x @y 2 [0;2]
2 2
4 4y 12x 8xy 12 0
= = :
8xy 4x2 + 4 12y 2 [0;2]
0 44

Since the (1; 1) entry is negative and jHj > 0, H de…nes a negative de…nite quadratic
form and so f has a local maximum at [0; 2].

I An Example in R3

Example 4 Consider the function

g(x; y; z) = 5x2 + 2xz + 4xy + 10x + 3z 2 6yz 6z + 5y 2 + 12y + 21.

We …nd the critical points by solving the system


8 @g
>
> @x = 10x + 2z + 4y + 10 = 0
>
<
@g
> @y = 4x 6z + 10y + 12 = 0 .
>
>
: @g
@z = 2x + 6z 6y 6 = 0

Using row reduction to solve this linear system yields the unique critical point
[ 9; 12; 16]. The Hessian matrix at [ 9; 12; 16] is
2 2 3
@ g @2g @2g
@x2 @x@y @x@z 2 3
6 7 10 4 2
6 @2g @2g @2g 7
H= 66 @y@x @y2 @y@z 7
7 = 4 4 10 6 5 .
4 5 2 6 6
2 2 2
@ g @ g @ g
@z@x @z@y @z 2 [ 9;12;16]

A lengthy computation produces pH (x) = x3 26x2 + 164x 8. The roots of pH (x)


are approximately 0:04916, 10:6011, and 15:3497. Since all of these eigenvalues for
H are positive, [ 9; 12; 16] is a local minimum for g.

I Failure of the Hessian Matrix Test


In calculus, we discovered that the second derivative test fails when the second
derivative is zero at a critical point. A similar situation is true in Rn . If the Hessian
matrix at a critical point has 0 as an eigenvalue, and all other eigenvalues have the
same sign, then the function f could have a local maximum, a local minimum, or
neither at this critical point. Of course, if the Hessian matrix at a critical point has
two eigenvalues with opposite signs, the critical point is not a local extreme point
(why?). Exercise 2 illustrates these concepts.

I New Vocabulary
C 2 (Rn ) (functions from Rn to R having continuous second partial derivatives)
critical point (of a function)
gradient (of a function on Rn )
Hessian matrix
local maximum (of a function on Rn )
local minimum (of a function on Rn )
negative de…nite quadratic form

Andrilli/Hecker— Elementary Linear Algebra, 4th ed.— March 15, 2010

Copyright © 2010, Elsevier Inc. All rights reserved.


6

open hypersphere (in Rn )


positive de…nite quadratic form
Taylor’s Theorem (in Rn )

I Highlights
h i
@f @f @f
The gradient of a function f : Rn ! R is de…ned by rf = @x1 ; @x2 ; : : : ; @xn :

Let A be an open hypersphere about x0 , and let f be a function on A with


continuous partial derivatives. If f has a local maximum or minimum at x0 ,
then rf (x0 ) = 0:

For a function f : Rn ! R; its corresponding Hessian matrix H is the n n


2
matrix whose (i; j) entry is @x@i @x
f
: In particular, for a function f : R2 ! R;
2 @ 2 f j@ 2 f 3
@x2 @x@y
the Hessian matrix H = 4 2
5:
@ f @2f
@y@x @y 2

Taylor’s Theorem in Rn : Let A be an open hypersphere centered at x0 2 Rn ,


let u be a unit vector in Rn , and let t 2 R such that x0 + tu 2 A. Sup-
pose f : A ! R has continuous second partial derivatives throughout A; that
is, f 2 C 2 (A). Then there is a c with 0 c t such that f (x0 +tu) = f (x0 )+
Pn @f Pn 2 Pn Pn 2

i=1 @xi (tui )+ 21 i=1 @@xf2 (t2 u2i )+ i=1 j=i+1 @x@i @x
f
j
(t2 ui uj ):
x0 i x0 +cu x0 +cu
1 T
In particular, we have f (x0 +v) = f (x0 )+ rf v + 2v H v,
x0 x0 +kv
for some k with 0 k 1.
Let A be an open hypersphere centered at x0 2 Rn : If f : A ! R has contin-
uous second partial derivatives throughout A, then f : A ! R, f has a local
minimum at a critical point x0 if vT H v > 0 for all nonzero vectors v.
x0

Similarly, f has a local maximum at a critical point x0 if vT H v<0


x0
for all nonzero vectors v.
A quadratic form is an expression of the form vT Av, where v is a vector
in Rn , and A is an n n matrix. A positive de…nite quadratic form is one
such that vT Av > 0 for all nonzero vectors v. Similarly, a negative de…nite
quadratic form is one such that vT Av < 0 for all nonzero vectors v.

For a function f : Rn ! R having Hessian matrix H, if vT H v is


x0
a positive de…nite quadratic form at a critical point x0 , then f has a local
minimum at x0 . Similarly, if vT H v is a negative de…nite quadratic
x0
form at a critical point x0 , then f has a local maximum at x0 .
A symmetric matrix A de…nes a positive de…nite quadratic form vT Av if and
only if all of the eigenvalues of A are positive.
A symmetric matrix A de…nes a negative de…nite quadratic form vT Av if
and only if all of the eigenvalues of A are negative.
If f : Rn ! R has Hessian matrix H; and all eigenvalues of H are positive at
a critical point x0 , then f has a local minimum at x0 .
If f : Rn ! R has Hessian matrix H; and all eigenvalues of H are negative at
a critical point x0 , then f has a local maximum at x0 .

Andrilli/Hecker— Elementary Linear Algebra, 4th ed.— March 15, 2010

Copyright © 2010, Elsevier Inc. All rights reserved.


7

A 2 2 symmetric matrix A has a positive de…nite quadratic form vT Av if


and only if a11 > 0 and jAj > 0. Similarly, a 2 2 symmetric matrix has a
negative de…nite quadratic form vT Av if and only if a11 < 0 and jAj > 0.

I EXERCISES

1. In each part, solve for all critical points for the given function.
Then, for
each critical point, use the Hessian matrix to determine whether the critical
point is a local maximum, a local minimum, or neither.

F a) f (x; y) = x3 + x2 + 2xy 3x + y 2
b) f (x; y) = 6x2 + 4xy + 3y 2 + 8x 9y
F c) f (x; y) = 2x2 + 2xy + 2x + y 2 2y + 5
3 2
d) f (x; y) = x + 3x y x + 3xy + 2xy 3x + y 3
2 2
y2 3y (Hint: To
solve for critical points, …rst set @f
@x
@f
@y = 0.)

F e) f (x; y; z) = 2x2 +2xy +2xz +y 4 +4y 3 z +6y 2 z 2 y 2 +4yz 3 4yz +z 4 z2

a) Show that f (x; y) = (x 2)4 + (y 3)2 has a local minimum at [2; 3],
but its Hessian matrix at [2; 3] has 0 as an eigenvalue.
b) Show that f (x; y) = (x 2)4 + (y 3)2 has a critical point at [2; 3], its
Hessian matrix at [2; 3] has all nonnegative eigenvalues, but [2; 3] is not
a local extreme point for f .
c) Show that f (x; y) = (x+1)4 (y+2)4 has a local maximum at [ 1; 2],
but its Hessian matrix at [ 1; 2] is O and thus has all of its eigenvalues
equal to zero.
d) Show that f (x; y; z) = (x 1)2 (y 2)2 +(z 3)4 does not have any local
extreme points. Then verify that its Hessian matrix has eigenvalues of
opposite sign at the function’s only critical point.

a b
a) Prove that a symmetric 2 2 matrix A = de…nes a positive
b c
de…nite quadratic form if and only if a > 0 and jAj > 0. (Hint: Compute
pA (x) and show that both roots are positive if and only if a > 0 and
jAj > 0.)
b) Prove that a symmetric 2 2 matrix A de…nes a negative de…nite
quadratic form if and only if a11 < 0 and jAj > 0.

F 2. True or False:

a) If f : Rn ! R has continuous second partial derivatives, then the Hessian


matrix is symmetric.
b) Every symmetric matrix A de…nes either a positive de…nite or a negative
de…nite quadratic form.
c) A Hessian matrix for a function with continuous second partial deriva-
tives evaluated at any point is diagonalizable.
5 3
d) vT v is a positive de…nite quadratic form.
3 2
2 3
3 0 0
e) vT 4 0 9 0 5 v is a positive de…nite quadratic form.
0 0 4

Andrilli/Hecker— Elementary Linear Algebra, 4th ed.— March 15, 2010

Copyright © 2010, Elsevier Inc. All rights reserved.


8

I Answers to Selected Exercises


(1) (a) Critical points: (1; 1), ( 1; 1); local minimum at (1; 1)
(c) Critical point: ( 2; 3); local minimum at ( 2; 3)
(e) Critical points: (0; 0; 0); ( 12 ; 1
2;
1
2 ); ( 1 1 1
2 ; 2 ; 2 ); local minimums at
( 12 ; 12 ; 12 ); ( 12 ; 12 ; 12 )
(4) (a) T
(b) F
(c) T
(d) T
(e) F

Andrilli/Hecker— Elementary Linear Algebra, 4th ed.— March 15, 2010

Copyright © 2010, Elsevier Inc. All rights reserved.

You might also like