0% found this document useful (0 votes)
37 views5 pages

Functions of Several Variables: Unconstrained Extrema: N K N N N H 0 N N

1. The document discusses Taylor series expansions for functions of multiple variables. It defines the Taylor polynomial and series expansions that approximate a function near a point by increasingly higher-order terms. 2. It introduces notation for the gradient (first derivative) and Hessian (second derivative) of functions of multiple variables. The gradient is a vector and the Hessian is a matrix containing second-order partial derivatives. 3. Taylor series expansions are derived for functions of several variables by considering the behavior along a single direction defined by a unit vector. This reduces to analyzing a function of one variable. 4. Formulas are given to approximate a function near a point using its tangent plane (first-order terms) or par

Uploaded by

SophaVisa Khun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views5 pages

Functions of Several Variables: Unconstrained Extrema: N K N N N H 0 N N

1. The document discusses Taylor series expansions for functions of multiple variables. It defines the Taylor polynomial and series expansions that approximate a function near a point by increasingly higher-order terms. 2. It introduces notation for the gradient (first derivative) and Hessian (second derivative) of functions of multiple variables. The gradient is a vector and the Hessian is a matrix containing second-order partial derivatives. 3. Taylor series expansions are derived for functions of several variables by considering the behavior along a single direction defined by a unit vector. This reduces to analyzing a function of one variable. 4. Formulas are given to approximate a function near a point using its tangent plane (first-order terms) or par

Uploaded by

SophaVisa Khun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Functions of several variables: unconstrained extrema

Taylor formula in one dimension


Consider f : R → R, possessing N continuous derivatives at x = x0 . Then
N
X f (k) (x0 ) o(hN )
f (x0 + h) = hk + o(hN ) ≡ PN (x0 , h) + o(hN ), with lim = 0. (1)
k! h→0 hN
k=0

Namely, PN (x0 , h) is the degree N Taylor polynomial of f at x = x0 , while the “error term” f (x0 + h) −
PN (x0 , h) goes to zero faster than hN as h → 0. Letting N → ∞, one gets the Taylor series for f , which
may or may not converge, depending on f, x0 and h.

X f (k) (x0 )
f (x0 + h) = hk . (2)
k!
k=0

Taylor formula in several dimensions


Consider f : Rn → R, possessing continuous partial derivatives v up to order N at x = x0 = (x01 , . . . , x0n ).
u n
uX
Fix a unit direction vector d = (d1 , . . . , dn ), with kdk = t d2i = 1. Consider a function g(t) = f (x0 +
i=1
td) of one variable t ∈ (−1, 1), with g(0) = f (x0 ) (step away from x0 by the amount t in the direction
d). The function g(t) can be expanded into the Taylor series (1,2) in t, using the fact that by the Chain
d ∂ ∂
rule = d1 + . . . + dn = d · ∇, i.e.
dt ∂x1 ∂xn
d Pn ∂
dt g(0) = i=1 di ∂xi f (x0 ) = (d · ∇)f (x0 );

Pn ³P ´
d2 d ∂ n ∂
dt2
g(0) = dt ((d · ∇)f (x0 )) = i=1 di ∂xi j=1 dj ∂xj f (x0 ) = (d · ∇)2 f (x0 );
(3)
...
³ ´
dk d dk−1
dtk
g(0) = dt dtk
g(0) = (d · ∇)k f (x0 ).

In other words, (d · ∇)k is a notational shortcut for writing sums of partial derivatives of order k. E.g. if
the dimension n = 2, with d = (d1 , d2 ), the second derivative of g

d2 2 2 ∂
2 ∂2 2 ∂
2
g = (d · ∇) f = d 1 f + 2d 1 d 2 f + d 2 f = dT Hd,
dt2 ∂x21 ∂x1 ∂x2 ∂x22
where (with yet another notation for partial derivatives)
· ¸
fx1 x1 fx1 x2
H= .
fx1 x2 fx2 x2
d2
In n = 3 dimensions still dt2
g = dT Hd, with
 
fx1 x1 fx1 x2 fx1 x3

H = fx1 x2 fx2 x2 fx2 x3  .
fx1 x3 fx2 x3 fx3 x3

1
d2
In general, for any n the second derivative dt2
g in the same way equals dT Hd, where the matrix
 ∂2 ∂2

∂x21
f ... ∂x1 ∂xn f
 
H= ... 
∂2 ∂2
∂x1 ∂xn f ... ∂x2n
f

(depending on x) is a matrix, called the Hessian matrix of f . The Hessian matrix, representing the second
derivative of the function of several variables f is therefore 2
µ often denoted as ¶ D f.
∂ ∂
In the same vein, the notation Df = ∇f = gradf = f, . . . , f stands for the derivative, or
∂x1 ∂xn
the gradient of f . Note that given a point x = x0 , the derivative (gradient) of the scalar function f at
this point is a vector, while the second derivative (the Hessian) is a matrix.
Substituting the notations (3) into the Taylor formulae (1,2), letting td = h, one gets their higher-
dimensional analogues:
PN (h·∇)k f (x0 ) o(khkN )
f (x0 + h) = k=0 k! + o(khkN ), with limh→0 khkN
= 0,
(4)
P∞ (h·∇)k f (x0 )
f (x0 + h) = k=0 k! .

In the first formula, the main term (the sum) is a polynomial of degree N in h = (h1 , . . . , hn ), the “error
term” goes to zero faster than the length khkN as h → 0. Also, the expression (h · ∇)k f (x0 ) can be
written explicitly by the Newton multinomial formula. For instance, for n = 2 with h = (u, v) one has

Xk µ ¶
k k ∂k
(h · ∇) f (x0 ) = f (x0 ) uk v k−s ,
s ∂x s ∂xk−s
1 2
s=0
µ ¶
k
where are binomial coefficients, the number of combinations “choose s out of k”. More generally,
s
for x ∈ Rn , we have
 k
Xn k
X
∂  ∂ k f (x0 )
(h · ∇)k f (x0 ) =  hj f (x0 ) = hj1 . . . hjk .
∂xj ∂xj1 . . . ∂xjn
j=1 j1 ,...,jn =1

One can also group the nk terms in the last formula and rewrite it with multinomial, rather than binomial
coefficients. Note, importantly, that practically speaking, in all the formulae above, expressions like
∂ k f (x0 ) ∂ k f (x)
mean that first the partial derivative is found, and then one lets x = x0 .
∂xj1 . . . ∂xjn ∂xj1 . . . ∂xjn
We shall be interested in two truncations of the Taylor formula (4):

f (x0 + h) ≈ f (x0 ) + Df (x0 ) · h and


(5)
f (x0 + h) ≈ f (x0 ) + Df (x0 ) · h + 12 hT [D2 f (x0 )]h.

The first truncation approximates the surface y = f (x) in Rn+1 near x = x0 by its tangent plane
y − y0 = Df (x0 ) · (x − x0 ), passing through the point (x, y) = (x0 , f (x0 )). The second one is still more
precise, approximating the surface by a paraboloid.

2
Remark. Strictly speaking, it makes more sense to accept the above formulas (4,5) as a definition of
differentiability of f (x) at x = x0 . I.e. f (x) is differentiable at x = x0 if and only if for any h ∈ Rn one
has
f (x0 + h) = f (x0 ) + Df (x0 ) · h + o(khk),
for some vector Df (x0 ) ∈ Rn . The latter vector is called the derivative of f (x) at x = x0 ; the components
of this vector will be partial derivatives of f . The partial derivatives are defined as directional derivatives
along the coordinate axes: if di is a unit vector in the direction of the ith coordinate axis, i = 1, . . . , n,
i.e. the components of di are all 0, except 1 at the position i, the limit

∂f f (x0 + tdi ) − f (x0 )


(x0 ) = lim
∂xi t→0 t
is called the ith partial derivative of f (x) at x = x0 , also often denoted as fxi (x0 ). Higher order partial
derivatives are defined in the same way. There is a subtlety here that f is N times differentiable at x = x0
if and only if it has all the partial derivatives up to order N , and they are continuous, in which case the
order of differentiation for mixed partial derivatives does not matter.

Local extrema
1. Definition: A point x = x0 is a local minimum [maximum] of f (x) if for all x in some neighbour-
hood of x0 , f (x) ≥ [≤] f (x0 ). In both cases, it is called a local extremum. In other words, x = x0
is a local minimum [maximum] of f (x) iff for all the unit direction vectors d ∈ Rn , a function of
one variable g(t) = f (x0 + td) has a local minimum [maximum] at t = 0.

2. Note: Essentially, it means that f will increase [decrease] if one makes a small step away from
x = x0 in any feasible direction d. Dealing with unconstrained extrema, any direction is feasible.
In a constrained case to be studied later, one will be allowed to step away from x0 only in spe-
cific directions, tangent to the intersection of the specific surfaces, corresponding to the equality
constraints.

3. Proposition: If f (x) is defined and differentiable in the neighbourhood of x = x0 and has an


extremum at x0 , then Df (x0 ) = 0. Any point x ∈ Rn , such that Df (x) = 0 is called a critical, or
a stationary point of f .
Proof: Otherwise, if Df (x) 6= 0, f would increase if one steps away from x0 in the direction of
gradf and decrease in the opposite direction; hence it cannot have an extremum at x0 .
Note: Df (x0 ) = 0 means that the plane, tangent to the surface y = f (x) in Rn+1 at x = x0 is
horizontal (y being the vertical direction).
Remark: The other two possibilities are that f is not defined in a full neighbourhood of x0 , the
latter then being a boundary point for the domain of f , or Df (x0 ) does not exist, in which case x0
is referred to as a singular point of f .

4. Definition: A critical point x0 , which is not a local extremum of f , is called a saddle point. Thus,
a critical point is either a local extremum or a saddle point. Classification of local extrema can be
in most cases fulfilled via the second derivative test, by looking at the quadratic in h term in the
second equation of (5).

5. Proposition: A critical point x0 is a local minimum [maximum] if the Hessian D2 f (x0 ) at this
point is a positive [negative] definite quadratic form matrix. It is a saddle point if the quadratic
form with the matrix D2 f (x0 ) is strictly indefinite.
Proof: Follows from the second formula of (5), where Df (x0 ) = 0, and the ensuing definitions

3
apropos of quadratic forms.
Note: This still leaves behind some (rare and more subtle) possibilities, dealing with the case when
the Hessian matrix is singular at x = x0 (i.e. it has a zero determinant). These possibilities would
require a more subtle analysis, such as looking at the Hessian matrix D2 f (x) for all x in some
neighbourhood of x = x0 .

Sign-definite and indefinite matrices: quadratic forms


· ¸
A B
Let h = (u, v) (two dimensions). Let a 2×2 symmetric matrix Q = .
B C
A quadratic form in two variables u, v is an expression of the form
Q(u, v) = Au2 + 2Buv + Cv 2 = hT Qh.
 
A B C
Let h = (u, v, w) (three dimensions). Let a 3×3 symmetric matrix Q =  B D E .
C E F
A quadratic form in three variables u, v, w is an expression of the form
Q(u, v, w) = Au2 + Dv 2 + F w2 + 2Buv + 2Cuw + 2Evw = hT Qh.
1. Definition: Let h = (h1 , . . . , hn ) ∈ Rn . Let Q be an n × n symmetric matrix, i.e QT = Q. A
quadratic form in n variables h1 , . . . , hn is an expression of the form
Q(h) = Q(h1 , . . . , hn ) = hT Qh = h · Qh.

2. Definition: A quadratic form Q(h), is said to be


• positive semi-definite if h · Qh ≥ 0, ∀h 6= 0;
• negative semi-definite if h · Qh ≤ 0, ∀h 6= 0;
• positive definite if h · Qh > 0, ∀h 6= 0;
• negative definite if h · Qh < 0, ∀h 6= 0;
• strictly indefinite if ∃ h1 , h2 6= 0 : h1 · Qh1 < 0, h2 · Qh2 > 0.
Note: The form Q is (semi)negative definite iff the form −Q is (semi)positive definite.
3. Eigenvalue criterion: A quadratic form Q(h1 , . . . , hn ) with the matrix Q is positive [negative]
definite iff all the eigenvalues of Q are strictly positive [negative]. It is strictly indefinite, iff there
exists one strictly negative and one strictly positive eigenvalue of the matrix Q.
Proof: Any symmetric matrix Q has n mutually orthogonal, unit length eigenvectors ui , i =
1, . . . , n, corresponding to the eigenvalues λi , which are all real, but not necessarily distinct. Namely,
one has Qui = λi ui , and the number of linearly independent eigenvectors, corresponding to the same
eigenvalue λi is called the latter’s (geometric) multiplicity. In any case, the eigenvectors ui can be
chosen1 to form an orthogonal basis in Rn , and any vector h ∈ Rn can be expanded in this basis as
1
These facts come from linear algebra. For instance, how does one prove that a real symmetric matrix has all real
eigenvalues and its eigenvectors, corresponding to different eigenvalues are orthogonal? If ∗ denotes complex conjugate, take
the formula Qu = λu and dot it with u∗ . As u 6= 0, λ = uu∗·Q u . The denominator is real, and so is the numerator, because

·u
(u∗ · Qu)∗ = u · Qu∗ = u∗ · Qu. The first step used the fact that Q is real, the second that it is symmetric. So λ is real,
as well as u can be chosen real. Now suppose λ1,2 are two distinct eigenvalues and u1,2 are corresponding eigenvectors. So
Au1 = λ1 u1 and Au2 = λ2 u2 . Take the dot product of the first expression with u2 , the second one with u1 and subtract.
By symmetry of Q, get 0 = u2 · Qu1 − u1 · Qu2 = (λ1 − λ2 )(u1 · u2 ). So u1 · u2 = 0, as λ1 6= λ2 . If some λ is a multiple
eigenvalue of multiplicity k, one can also choose k orthogonal eigenvectors corresponding to it, but proving this is harder.

4
n
X
h= hi ui , with some coefficients hi . Then
i=1
à n
! Ã n ! n
X X X X
i i
h · Qh = hi u ·Q hi u = λi hi hj [ui · uj ] = λi h2i ,
i=1 i=1 i,j=1,...,n i=1

as on the last step ui · uj = 0, whenever i 6= j, while ui · ui = kui k2 = 1. This proves the statement,
as the right hand side is positive [negative] for all h iff all λi are positive [negative], while in the
strictly indefinite case one can take h1 = u1 , h2 = u2 , where the corresponding eigenvalues λ1 < 0
and λ2 > 0.

Note: Any quadratic form matrix Q has two important coordinate-independent invariants: its
n
Y n
X P
determinant det Q = λi and trace Tr Q = λi = ni=1 qii , the sum of all the diagonal elements
i=1 i=1
of Q. Calculating the determinant and the trace can be very informative apropos of the above
eigenvalue criterion and fully does the job if the dimension n = 2.
Note: If det Q = 0, then at least one of its eigenvalues is zero, and in this case the second derivative
test fails to determine the type of the critical point. Otherwise, instead of finding all the eigenvalues
of Q, one can use the ensuing criterion, due to Sylvestre.

4. Definition: Given an n × n matrix Q, a leading minorMk of Q, of order k = 1, . . . , n is the


determinant of the upper left k × k submatrix of Q.
Sylvestre criteiron: A quadratic form Q(h1 , . . . , hn ) with the matrix Q, such that det Q 6= 0, is

• positive definite iff det Mk > 0 for all the leading minors Mk of Q, k = 1, . . . , n;
• negative definite iff (−1)k det Mk > 0 for all the leading minors Mk of Q, k = 1, . . . , n;
• strictly indefinite if none of the above holds.

Note: The proof of the first bullet can be found in a linear algebra textbook. The second bullet
easily follows from the first one: If Q is negative-definite, then −Q should be positive definite.
However, all the even order leading minors of −Q (corresponding to k = 2, 4, . . .) coincide with
those for Q, because they are computed in terms of products of k entries of Q, and thus all the
− signs will multiply up to 1 rather than −1 for the odd order leading minors (corresponding to
k = 1, 3, . . .).
The third bullet also follows easily, because if det Q 6= 0, all the eigenvalues are nonzero, and then
the only alternative for then, rather than being all positive [negative] is the existence of a pair of
eigenvalues with opposite signs.
Note: The Sylvestre criterion works only if det Q 6= 0. For two dimensions (above), positive-
definiteness of Q means A, AC − B 2 > 0 (and −A, AC − B 2 > 0 for negative-definiteness). If
AC − B 2 < 0, the quadratic form is strictly indefinite.

You might also like