Functions of Several Variables: Unconstrained Extrema: N K N N N H 0 N N
Functions of Several Variables: Unconstrained Extrema: N K N N N H 0 N N
Namely, PN (x0 , h) is the degree N Taylor polynomial of f at x = x0 , while the “error term” f (x0 + h) −
PN (x0 , h) goes to zero faster than hN as h → 0. Letting N → ∞, one gets the Taylor series for f , which
may or may not converge, depending on f, x0 and h.
∞
X f (k) (x0 )
f (x0 + h) = hk . (2)
k!
k=0
Pn ³P ´
d2 d ∂ n ∂
dt2
g(0) = dt ((d · ∇)f (x0 )) = i=1 di ∂xi j=1 dj ∂xj f (x0 ) = (d · ∇)2 f (x0 );
(3)
...
³ ´
dk d dk−1
dtk
g(0) = dt dtk
g(0) = (d · ∇)k f (x0 ).
In other words, (d · ∇)k is a notational shortcut for writing sums of partial derivatives of order k. E.g. if
the dimension n = 2, with d = (d1 , d2 ), the second derivative of g
d2 2 2 ∂
2 ∂2 2 ∂
2
g = (d · ∇) f = d 1 f + 2d 1 d 2 f + d 2 f = dT Hd,
dt2 ∂x21 ∂x1 ∂x2 ∂x22
where (with yet another notation for partial derivatives)
· ¸
fx1 x1 fx1 x2
H= .
fx1 x2 fx2 x2
d2
In n = 3 dimensions still dt2
g = dT Hd, with
fx1 x1 fx1 x2 fx1 x3
H = fx1 x2 fx2 x2 fx2 x3 .
fx1 x3 fx2 x3 fx3 x3
1
d2
In general, for any n the second derivative dt2
g in the same way equals dT Hd, where the matrix
∂2 ∂2
∂x21
f ... ∂x1 ∂xn f
H= ...
∂2 ∂2
∂x1 ∂xn f ... ∂x2n
f
(depending on x) is a matrix, called the Hessian matrix of f . The Hessian matrix, representing the second
derivative of the function of several variables f is therefore 2
µ often denoted as ¶ D f.
∂ ∂
In the same vein, the notation Df = ∇f = gradf = f, . . . , f stands for the derivative, or
∂x1 ∂xn
the gradient of f . Note that given a point x = x0 , the derivative (gradient) of the scalar function f at
this point is a vector, while the second derivative (the Hessian) is a matrix.
Substituting the notations (3) into the Taylor formulae (1,2), letting td = h, one gets their higher-
dimensional analogues:
PN (h·∇)k f (x0 ) o(khkN )
f (x0 + h) = k=0 k! + o(khkN ), with limh→0 khkN
= 0,
(4)
P∞ (h·∇)k f (x0 )
f (x0 + h) = k=0 k! .
In the first formula, the main term (the sum) is a polynomial of degree N in h = (h1 , . . . , hn ), the “error
term” goes to zero faster than the length khkN as h → 0. Also, the expression (h · ∇)k f (x0 ) can be
written explicitly by the Newton multinomial formula. For instance, for n = 2 with h = (u, v) one has
Xk µ ¶
k k ∂k
(h · ∇) f (x0 ) = f (x0 ) uk v k−s ,
s ∂x s ∂xk−s
1 2
s=0
µ ¶
k
where are binomial coefficients, the number of combinations “choose s out of k”. More generally,
s
for x ∈ Rn , we have
k
Xn k
X
∂ ∂ k f (x0 )
(h · ∇)k f (x0 ) = hj f (x0 ) = hj1 . . . hjk .
∂xj ∂xj1 . . . ∂xjn
j=1 j1 ,...,jn =1
One can also group the nk terms in the last formula and rewrite it with multinomial, rather than binomial
coefficients. Note, importantly, that practically speaking, in all the formulae above, expressions like
∂ k f (x0 ) ∂ k f (x)
mean that first the partial derivative is found, and then one lets x = x0 .
∂xj1 . . . ∂xjn ∂xj1 . . . ∂xjn
We shall be interested in two truncations of the Taylor formula (4):
The first truncation approximates the surface y = f (x) in Rn+1 near x = x0 by its tangent plane
y − y0 = Df (x0 ) · (x − x0 ), passing through the point (x, y) = (x0 , f (x0 )). The second one is still more
precise, approximating the surface by a paraboloid.
2
Remark. Strictly speaking, it makes more sense to accept the above formulas (4,5) as a definition of
differentiability of f (x) at x = x0 . I.e. f (x) is differentiable at x = x0 if and only if for any h ∈ Rn one
has
f (x0 + h) = f (x0 ) + Df (x0 ) · h + o(khk),
for some vector Df (x0 ) ∈ Rn . The latter vector is called the derivative of f (x) at x = x0 ; the components
of this vector will be partial derivatives of f . The partial derivatives are defined as directional derivatives
along the coordinate axes: if di is a unit vector in the direction of the ith coordinate axis, i = 1, . . . , n,
i.e. the components of di are all 0, except 1 at the position i, the limit
Local extrema
1. Definition: A point x = x0 is a local minimum [maximum] of f (x) if for all x in some neighbour-
hood of x0 , f (x) ≥ [≤] f (x0 ). In both cases, it is called a local extremum. In other words, x = x0
is a local minimum [maximum] of f (x) iff for all the unit direction vectors d ∈ Rn , a function of
one variable g(t) = f (x0 + td) has a local minimum [maximum] at t = 0.
2. Note: Essentially, it means that f will increase [decrease] if one makes a small step away from
x = x0 in any feasible direction d. Dealing with unconstrained extrema, any direction is feasible.
In a constrained case to be studied later, one will be allowed to step away from x0 only in spe-
cific directions, tangent to the intersection of the specific surfaces, corresponding to the equality
constraints.
4. Definition: A critical point x0 , which is not a local extremum of f , is called a saddle point. Thus,
a critical point is either a local extremum or a saddle point. Classification of local extrema can be
in most cases fulfilled via the second derivative test, by looking at the quadratic in h term in the
second equation of (5).
5. Proposition: A critical point x0 is a local minimum [maximum] if the Hessian D2 f (x0 ) at this
point is a positive [negative] definite quadratic form matrix. It is a saddle point if the quadratic
form with the matrix D2 f (x0 ) is strictly indefinite.
Proof: Follows from the second formula of (5), where Df (x0 ) = 0, and the ensuing definitions
3
apropos of quadratic forms.
Note: This still leaves behind some (rare and more subtle) possibilities, dealing with the case when
the Hessian matrix is singular at x = x0 (i.e. it has a zero determinant). These possibilities would
require a more subtle analysis, such as looking at the Hessian matrix D2 f (x) for all x in some
neighbourhood of x = x0 .
4
n
X
h= hi ui , with some coefficients hi . Then
i=1
à n
! Ã n ! n
X X X X
i i
h · Qh = hi u ·Q hi u = λi hi hj [ui · uj ] = λi h2i ,
i=1 i=1 i,j=1,...,n i=1
as on the last step ui · uj = 0, whenever i 6= j, while ui · ui = kui k2 = 1. This proves the statement,
as the right hand side is positive [negative] for all h iff all λi are positive [negative], while in the
strictly indefinite case one can take h1 = u1 , h2 = u2 , where the corresponding eigenvalues λ1 < 0
and λ2 > 0.
Note: Any quadratic form matrix Q has two important coordinate-independent invariants: its
n
Y n
X P
determinant det Q = λi and trace Tr Q = λi = ni=1 qii , the sum of all the diagonal elements
i=1 i=1
of Q. Calculating the determinant and the trace can be very informative apropos of the above
eigenvalue criterion and fully does the job if the dimension n = 2.
Note: If det Q = 0, then at least one of its eigenvalues is zero, and in this case the second derivative
test fails to determine the type of the critical point. Otherwise, instead of finding all the eigenvalues
of Q, one can use the ensuing criterion, due to Sylvestre.
• positive definite iff det Mk > 0 for all the leading minors Mk of Q, k = 1, . . . , n;
• negative definite iff (−1)k det Mk > 0 for all the leading minors Mk of Q, k = 1, . . . , n;
• strictly indefinite if none of the above holds.
Note: The proof of the first bullet can be found in a linear algebra textbook. The second bullet
easily follows from the first one: If Q is negative-definite, then −Q should be positive definite.
However, all the even order leading minors of −Q (corresponding to k = 2, 4, . . .) coincide with
those for Q, because they are computed in terms of products of k entries of Q, and thus all the
− signs will multiply up to 1 rather than −1 for the odd order leading minors (corresponding to
k = 1, 3, . . .).
The third bullet also follows easily, because if det Q 6= 0, all the eigenvalues are nonzero, and then
the only alternative for then, rather than being all positive [negative] is the existence of a pair of
eigenvalues with opposite signs.
Note: The Sylvestre criterion works only if det Q 6= 0. For two dimensions (above), positive-
definiteness of Q means A, AC − B 2 > 0 (and −A, AC − B 2 > 0 for negative-definiteness). If
AC − B 2 < 0, the quadratic form is strictly indefinite.