Note_2 Math Camp
Note_2 Math Camp
Note 2 is based on de la Fuente (2000, Ch. 4) and Simon and Blume (1994, Ch. 14).
This note introduces the concept of di¤erentiability and discusses some of its main implica-
tions. We start with real-valued functions of only one argument, and then extend the notion of
di¤erentiability to multivalued functions: The key to the extension lies in the interpretation of
di¤erentiability in terms of the existence of a "good" linear approximation to a function around
a point. We also show that important aspects of the local behavior of "su¢ ciently di¤eren-
tiable" functions are captured accurately by linear or quadratic approximations. The material
has important applications to optimization and comparative statics.
whenever it exists, and we can interpret it as the slope of the function at this point.
f (x + h) f (x)
limh!0 with h 2 R:
h
1
When it does exist, we say the value of the limit is the derivative of f at x, written f 0 (x), Df (x)
or Dx f (x) : If f is di¤erentiable at each point in its domain, we say the function is di¤erentiable.
The value of di¤erentiable functions around some initial point can be nicely approximated
through Taylor’s expansion.
where f (k) (x) is the kth derivative of f evaluated at x, and the remainder or error term Rn (h)
is of the form
f (n) (x + h) n
Rn (h) = h
n!
for some 2 (0; 1). That is, remainder has the same form as the other terms except that the nth
derivative is evaluated at some point between x and x + h:
Proof. Put y = x + h and de…ne the function F (z) for z between x and y by
Xn 1 f (k) (z)
F (z) = f (y) f (z) (y z)k :
k=1 k!
Then the theorem says that for some point x + (y x) between x and y
f (n) [x + (y x)]
F (x) = (y x)n :
n!
2
and observe that G is a continuous function on the open interval (x; y) ; with
and
n 1
0 0 y z 1
G (z) = F (z) n F (x) : (2)
y x y x
By Rolle’s Theorem (see SB, p. 824), there exists some 2 (0; 1) such that
G0 [x + (y x)] = 0:
The di¤erentiability of f implies that the error term will be small. Hence the linear function in
the right-hand side of (3) is guaranteed to be a decent approximation to f near x. Higher-order
approximations that use more derivatives will be even better.
Let g and h be two real-valued functions on R, the function formed by …rst applying function
g to any number x and then applying function h to the result g (x) is called the composition of
functions g and h and is written as
3
Example 1. The functions which describe a …rm’s behavior, such as its pro…t function ; are
usually written as functions of a …rm’s output y. If one wants to study the dependence of a
…rm’s pro…t on the amount of an input x it uses, one must compose the pro…t function with the
…rm’s production function y = f (x). The latter function tells us how much output y the …rm
can obtain from x units of the given input. The result is a function
F (x) [f (x)] :
Notice that we use di¤erent names for and F as their arguments are di¤erent. N
The derivative of a composite function is obtained as the derivative of the outside function
(evaluated at the inside function) times the derivative of the inside function. This general form
is called the Chain Rule.
Theorem 3. (Chain Rule for univariate functions) Let g and h be two real-valued di¤er-
entiable functions on R; and de…ne f (x) = h [g (x)] : Then
df
(x) = h0 [g (x)] g 0 (x) :
dx
Example 2. Consider the model in Example 1. By using the Chain Rule we get
n o 10
0
[f (x)] f 0 (x) = 4 [f (x)]3 + 12 [f (x)] x 1=3
3
3 10 1=3
= 4 5x2=3 + 12 5x2=3 x
3
10 1=3
= 500x2 + 60x2=3 x
3
5000 5=3
= x + 200x1=3 :
3
5000 5=3
F 0 (x) = x + 200x1=3 :
3
4
Di¤erentiability of Multivariate Real-Valued Functions
In this section we study di¤erentiability of functions from Rn into R: We build this concept on
the results derived for real-valued univariate functions.
f (x + hv)
limh!0 with h 2 R and kvk = 1
h
Directional derivatives in the direction of the coordinate axes are of special interest. The
partial derivative of f with respect to its ith argument is de…ned as
f (x + hei )
limh!0 with h 2 R
h
where ei is a vector whose components are all zero except for the ith one, which is 1.
whenever it exists. [Other usual ways to write @f (x) =@xi are Dxi f (x) and fxi (x) :]
5
Example 3. Let us consider the function f (x1 ; x2 ) = x21 x2 : Its partial derivatives are
@f1 @f1
(x1 ; x2 ) = 2x1 x2 and (x1 ; x2 ) = x21 :
@x1 @x2
Let f : Rn ! R, then each one of its partial derivatives @f (x) =@xi is also a real-valued
function of n variables and the partials of @f (x) =@xi can be de…ned as before. The partials of
@f (x) =@xi ; for i = 1; :::; n; are the second-order partial derivatives of f:
6
The obvious limitation is that it generally yields only local results, valid only in some
small neighborhood of the initial solution.
f (x + h) f (x) + aTx h
limkhk!0 =0 (5)
khk
qP
n 2
where h 2 Rn and k k is the Euclidean norm of a vector, kxk = i=1 (xi ) . If f is di¤erentiable
at every point in its domain, we say that f is di¤erentiable.
Dx f : Rn ! ax
that goes from Rn to an n-dimensional vector ax : When it is clear from the context that we are
di¤erentiating with respect to the vector x we will write Df instead of Dx f:
It is apparent now that we can, as in the case of a univariate real-valued function, interpret the
di¤erential in terms of a linear approximation to f (x + h). That is, we can consider f (x) + aTx h
as a linear approximation of f (x + h). Expression (5) guarantees that the approximation will
be good whenever h is small. [If we de…ne E (h) = f (x + h) f (x) + aTx h as the error of the
linear approximation, then condition (4) ensures limkhk!0 jE (h)j = khk = 0:]
The derivative of f at x, Df (x), relates to the partial derivatives of f at x in a natural way.
If f : Rn ! R is di¤erentiable at x then the derivative of f at x is the vector of partial derivatives
h i
Df (x) = Dx f (x) = @f @f @f
@x1 (x) @x2 (x) ::: @xn (x)
7
Example 4. Consider the function f (x1 ; x2 ) = x21 x2 in Example 3. In this case,
The partials of @f (x) =@xi ; for i = 1; :::; n; are the second-order partial derivative of f that
we identify by
0 1
@2f @2f @2f
B @x21
(x) @x2 @x1 (x) ::: @xn @x1 (x) C
B @2f @2f @2f
C
B (x) (x) ::: C
B @x1 @x2 @x22 @xn @x2 (x) C
D2 f (x) = Dx2 f (x) = B .. .. .. C:
B .. C
B . . . . C
@ A
@2f @2f 2
@ f
@x1 @xn (x) @x2 @xn (x) ::: @x2
(x)
n
@2f @2f
(x) = (x) :
@xi @xj @xj @xi
[Check Young’s theorem with the function f1 (x1 ; x2 ) = x21 x2 in Example 3.]
Taylor’s formula can be generalized to the case of a real-valued function of n variables. Because
the notation gets messy, and we will only use the simplest case, we will state the following theorem
for the case of a …rst order approximation with a quadratic form remainder.
We will use Theorem 8 to prove the su¢ cient conditions for maxima in optimization problems.
8
The Chain Rule
In many cases we are interested in the derivatives of composite functions. The following result
says that the composition of di¤erentiable functions is di¤erentiable, and its derivative is the
product of the derivatives of the original functions.
Theorem 9. (Chain rule for multivariate functions) Let g and h be two functions with
g : Rn ! R and h : R ! R
and de…ne f (x) = h [g (x)] or f (x) = (h g) (x) with f : Rn ! R: If g and h are di¤erentiable,
then f = h g is di¤erentiable and
Proof. See de la Fuente (2000), pp. 176-178. (They provide a proof for a result that is more
general than the statement in Theorem 9.)
The next example sheds light on the implementation of the last result.
Example 5. Let f (x1 ; x2 ) = (x1 x2 )2 . We want to use the Chain Rule to …nd @f =@x1 and
@f =@x2 . To this end, let us de…ne
= 2g (x1 ; x2 ) x2 x1
= 2x1 x2 x2 x1
9
Di¤erentiability of Functions from Rn into Rm
We now turn to the general case where f : Rn ! Rm is a function of n variables whose value is
a vector of m elements. As we learned in Note 1, we can think of the mapping f as a vector of
component functions fi , each of which is a real-valued function of n variables
Thinking in this way, the extension is trivial (although the notation becomes messier!).
kf (x + h)
[f (x) + Ax h]k
limkhk!0 =0 (6)
khk
qP
n n 2
where h 2 R and k k is the Euclidean norm of a vector, kxk = i=1 (xi ) . If f is di¤erentiable
Df = Dx f : Rn ! Ax
0 1 0 1
@f1 @f1 @f1
Df1 (x) @x1 (x) @x2 (x) ::: @xn (x)
B C B C
B C B @f2 @f2 @f2 C
B Df2 (x) C B (x) (x) ::: (x) C
Df (x) = Dx f (x) = B C=B C:
@x1 @x2 @xn
B .. C B .. .. .. .. C
B . C B . . . . C
@ A @ A
@fm @fm @fm
Dfm (x) @x1 (x) @x2 (x) ::: @xn (x)
that we often call Jacobian matrix of f (evaluated) at x: If the partial derivatives of the component
functions f1 ; f2 ; :::; fm exist and are continuous, then f is di¤erentiable [see de la Fuente (2000),
pp. 172-175].
10
Example 6. Consider the functions f1 (x1 ; x2 ) = x21 x2 and f2 (x1 ; x2 ) = ln (x1 + x2 ) : Its partial
derivatives are
Therefore,
1 1
Df1 (x1 ; x2 ) = 2x1 x2 x21 and Df2 (x1 ; x2 ) = x1 +x2 x1 +x2
:
[Calculate the same concepts for the functions g1 (x1 ; x2 ) = x1 x2 and g2 (x1 ; x2 ) =
ln (2x1 + 3x2 ) :] N
The following result states again that the composition of di¤erentiable functions is di¤eren-
tiable, and its derivative is the product of the derivatives of the original functions.
Theorem 9. (Chain rule for multivariate functions) Let g and h be two functions with
g : Rn ! Rm and h : Rm ! Rp
and de…ne f (x) = h [g (x)] or f (x) = (h g) (x) with f : Rn ! Rp : If g and h are di¤erentiable,
then f = h g is di¤erentiable and
The last example applies the Chain Rule to a case where n = m = 3 and p = 1.
x = r + t; y = s and z = s + t:
We want to use the Chain Rule to address @l=@r, @l=@s and @l=@t:
11
Let us de…ne g = (g1 ; g2 ; g3 )T ; with gi : R3 ! R for i = 1; 2; 3; as follows
g1 (r; s; t) = r + t
g2 (r; s; t) = s
g3 (r; s; t) = s + t:
Dh (x; y; z) = y 2 z 2xyz xy 2 :
It follows that
h i
Dh [g (r; s; t)] = s2 (s + t) 2 (r + t) s (s + t) (r + t) s2 :
12