Calculus+Notes
Calculus+Notes
Notes on Calculus
This collection of notes first focuses on the calculus of functions from R to R and how well-behaved
functions may be maximized over well-behaved domains. We will then mirror these concepts for multi-
variate functions.
Definition 1. A function f : R ! R is differentiable at x if the following limit exists for all sequences
x 0 6= x that converge to x.
f (x 0 ) f (x) f (x + h) f (x)
lim = lim
x 0 !x x0 x h!0 h
d d f (x)
This limit is known as the derivative of f at x and denoted dx f (x) = dx = f 0 (x). If this derivative itself
is a continuous function, then we say that f is continuously differentiable.
Note: this limit can only be evaluated through sequences x 0 that are never equal to x, but nonetheless
converge to x (such as x + 1/n).
The derivative represents the local limiting rate of change of the function at x: how much the output of
the function changes as the input changes some infinitesimal amount.
Proof : Here we will show that f (x 0 ) f (x) converges to 0 if f is differentiable. This is equivalent to
saying for any x 0 that converges to x, f (x 0 ) converges to f (x). We will use the fact that the limit of the
l
product of converging sequences is the product of their respective limits.
f (x 0 ) f (x) f (x 0 ) f (x)
lim f (x 0 ) f (x) = lim · (x 0 x) = lim lim ·(x 0 x) = f 0 (x) · 0 = 0
x !x
0 0 x !x x0 x x !x
0
x0 x x 0 !x
The derivative of a function can be used as a local linear approximation: for any " > 0, there are values
of h that are small enough such that
and thus the line f (x)+ f 0 (x)h can nearly represent small changes in the function from f (x) to f (x +h).
Properties of Derivatives
Proposition 1. If f and g are both differentiable at x, then the following properties hold.
d
• Linearity: dx (a f (x) + bg(x)) = a f 0 (x) + bg 0 (x)
d
• Product Rule: dx ( f (x)g(x)) = f 0 (x)g(x) + f (x)g 0 (x)
d
• Chain Rule: dx f (g(x)) = f 0 (g(x))g 0 (x)
1
• Inverse Rule: if f is injective (and therefore invertible) over some open ball around x = f ( y),
then ddy f 1 ( y) = 1/ f 0 (x)
1
Proof : First, linearity holds with a simple application of the fact that linear combinations of convergent
sequences converge to the same linear combination of their limits.
ï ò
d a f (x 0 ) + bg(x 0 ) a f (x) bg(x) f (x 0 ) f (x) g(x 0 ) g(x)
[a f (x) + bg(x)] = lim = lim a +b
dx x 0 !x x0 x x 0 !x x0 x x0 x
ï ò ï ò
f (x 0 ) f (x) g(x 0 ) g(x)
= a lim + b lim = a f 0 (x) + bg 0 (x)
x !x
0
x0 x x !x
0
x0 x
Next, for product rule we can also manipulate variables a bit to show how the derivative of a product
of two differentiable functions can be expressed in terms of their individual derivatives. We will also
use the fact that the product of two convergent sequences converges to the product of the limits of each
sequence.
ï ò
d f (x 0 )g(x 0 ) f (x)g(x) f (x 0 )g(x 0 )
f (x 0 )g(x) + f (x 0 )g(x) f (x)g(x)
[ f (x)g(x)] = lim = lim
dx x 0 !x x0 x x 0 !x x0 x
ï 0 ò
0 g(x ) g(x) f (x 0 ) f (x)
= lim f (x ) + g(x)
x 0 !x x0 x x0 x
ï 0 ò ï ò
0 g(x ) g(x) f (x 0 ) f (x)
= lim f (x ) lim + lim g(x) lim
x 0 !x x 0 !x x0 x x 0 !x x 0 !x x0 x
= f (x)g 0 (x) + g(x) f 0 (x)
Now for chain rule.
l
dy 0 y !y y0 y 0y !y f ( y 0)
1 f 1 ( y)
We can use these properties to derive the next set of properties, most of which are useful properties of
common functions across economic theory.
1 n
Proposition 2. • If f (x) = e x , then f 0 (x) = e x where e = lim 1 + n = lim (1 + n)1/n
n!1 n!0
2
• If f (x) = a x , then f 0 (x) = ln(a)a x
Proof :
d x e x+h e x eh 1 eh 1
e = lim = e x lim = e x lim
dx h!0 h h!0 h h!0 h
Now we will let t = eh 1 and therefore h = ln(t + 1).
d x t 1 1 1
e = e x lim = e x lim 1
= e x lim = ex = ex
dx t!0 ln(t + 1) t!0
t ln(t + 1) t!0 ln [(t + 1)1/t ] ln(e)
Having established this result, the derivatives of the next few functional forms can be deduced by their
relationship to this function.
d x d ⇥ ln(a) ⇤ x d x ln(a)
a = e = e = ln(a)e x ln(a) = ln(a)a x
dx dx dx
For f (x) = ln(x), we can apply the inverse rule.
l d 1 1 1 1
ln(x) = = y = ln(x) =
dx d e e x
ey
dy
d n d ⇥ ln(x) ⇤n d n ln(x) n n
x = e = e = e n ln(x) = x n = nx n 1
dx dx dx x x
For the quotient rule, we can evaluate the derivative of f (x)/g(x) as the product of f (x) and 1/g(x) =
g(x) 1
Proposition 3 (Mean Value Theorem). If b > a and f : [a, b] ! R is continuously differentiable, then
f (b) f (a)
9c 2 (a, b) such that f 0 (c) = b a .
The Mean Value Theorem can be interpreted as a proper correction of the error term in the local linear
approximation previously written. If one wishes to linearly approximate f (x + h) with the derivative
f 0 , simply using f 0 (x) as the slope is subject to an error term. However, one could instead use f 0 (z)
for some z 2 (x, x + h), because the mean value theorem states that such a value of z exists where
f 0 (z) = ( f (x + h) f (x)) / (h) , f (x + h) = f (x) + f 0 (z)h.
3
Proposition 4. Let f : R ! R be a continuously differentiable function.
The derivative of a function describes it’s monotonicity. Understanding these monotonicities are essential
in finding the maxima and minima of functions.
EX: Suppose we wish to find the maxima of the function f (x) = x 1/x over the positive domain x > 0.
First, let’s take the derivative of this function, applying the trick that x = eln(x) as well as the chain rule
and quotient rule.
ï ò
d 1/x d ln(x)/x d 1 ln(x)
x = e = eln(x)/x [ln(x)/x] = x 1/x
dx dx dx x2
Since x is a positive value, so is x 1/x . Thus this derivative is positive iff 1 ln(x) > 0 , x < e.
When x > e, the function is decreasing, and when x = e, the derivative is 0. Thus the function is
strictly increasing up to f (e), after which it decreases. Thus, this function has a unique maximum
exactly at x = e. This is of course convenient in that this function is “single-peaked”. As it turns out,
functions f : [a, b] ! R are single peaked exactly when they are strictly quasiconcave. In this case, the
function obtained a maximum over an open and unbounded domain, which we cannot always guarantee.
However, we know that any continuous function does obtain a maximum over compact domains (EVT),
and the quasiconcave case even provides uniqueness.
Proposition 5. If f : [a, b] ! R is strictly quasiconcave, then there is a unique maximizer of f over [a, b].
Most importantly though, from the previous example we saw that the maximum was achieved when
the derivative was 0. In general, a 0 derivative is a major indicator of the maxima/minima of continu-
ously differentiable functions.
Critical points may be local minima, local maxima, or neither (in which case we say that such a critical
point is a “saddle point”, such as x = 0 for f (x) = x 3 ). To identify which of these three cases applies to
d 0
a given critical point, we can use the second derivative of f at x, i.e. the derivative dx f (x) = f 00 (x).
00
If f is a continuous function, we say that f is twice continuously differentiable. Second derivatives
describe the local convexity of a function. We will say that a function is convex at x if it is convex over
some neighborhood containing x.
4
• If x is a critical point and f 00 (x) > 0, then x is a local minimum.
1 00 f (k) (x)
f (x̃) = f (x) + f 0 (x)(x̃ x) + f (x)(x̃ x)2 + · · · + (x̃ x)k + hk (x̃)(x̃ x)k
2 k!
where f (k) (x) is the kth derivative of f at x and hk is a function such that lim hk (x̃) = 0.
x̃!x
Multivariate Calculus
The remainder of this collection of notes is intended to mirror the previous half, but applying these
“rate-of-change” concepts to functions f : Rn ! R.
Definition 4. Let f : Rn ! R be a continuous function. The partial derivative of f with respect to x i at
x~ = (x 1 , x 2 , . . . , x n ) is the following limit and is denoted @@x i f (~
x ) = @ x i f (~
x ) = @i f (~
x ).
f (x 1 , . . . , x i + h, . . . , x n ) f (~
x)
@i f (~
x ) = lim
h!0 h
Partial derivatives are a very close concept to our initial notion of derivative. Suppose we have a
production function f (x, y) = C x A y B , with current inputs at (1, 2), and we want to know the rate of
change of this function as we increase x. The partial derivative @ x f (1, 2) treats y = 2 as a constant,
and calculates a derivative while pretending that f is only a function of x: @ x f (x, y) = AC x A 1 y B !
@ x f (1, 2) = AC · 1A 1 · 2B .
However, increasing a single coordinate is only one way to evaluate the rate of a change of a multi-
variate function. To illustrate the limitations of partial derivatives, consider the following function.
8
<5x if y = 0
f (x, y) = y if x = 0
:
0 otherwise
The partial derivatives at (0, 0) are @ x f (0, 0) = 5 and @ y f (0, 0) = 1. This describes how quickly
the functions output changes when moving from (0, 0) to (h, 0) or (0, h), but it does not describe how
the function changes for all small changes in input. In particular, moving from (0, 0) to (h, h) does not
change f any amount for any h > 0. The next concept of derivative considers such changes in inputs.
Definition 5. The directional derivative along v 2 Rn \ {0} of f : Rn ! R at x 2 Rn is the following
limit.
f (x 1 + hv1 , x 2 + hv2 , . . . , x n + hvn ) f (x)
@ v f (x) = lim
h!0 h
5
Partial derivatives are then nested examples of directional derivatives along the vectors (0, 1) and
(1, 0). In the example above, we can then say that the directional derivatives along any vectors other
than (0, 1) or (1, 0) are 0. Since we are just moving the value h 2 R, we can actually compute any
directional derivative of any function f as the derivative of a slightly obscured function.
Corollary 2. Suppose the directional derivative along v of f : Rn ! R exists at x and let g v,x (h) =
0
f (x + hv). Then @ v f (x) = g v,x (0).
Proof :
f (x + hv) f (x) g v,x (h) g v,x (0) d
@ v f (x) = lim = lim = g v,x (0)
h!0 h h!0 h 0 dh
Example: Consider again the “Cobb-Douglas” production function f (x, y) = C x A y B . We can compute
the directional derivative @ v f (x, y) by taking the derivative of g v,~x (h) at 0.
A
0
@ v f (x, y) = g v,~
x (0) = AC v1 x
1
y B + BC v2 x A y B 1
= v1 @ x f (x, y) + v2 @ y f (x, y)
As it turns out in this particular “well-behaved” case, any directional derivative can be expressed as a
linear combination of partial derivatives. This has a direct interpretation, as we can think of any vector
v = (v1 , v2 ) as the linear combination v1 (1, 0) + v2 (0, 1) of the standard coordinate basis vectors. Thus
the partial derivatives serve as particular directional derivatives along these basis vectors, and any other
directional derivative can be expressed as the weighted sum of each partial corresponding to these basis
directions. However, this decomposition cannot always be achieved.
Example: Consider the following continuous function.
® x3+ y3
if x =
6 0 or y 6= 0
f (x, y) = x + y
2 2
0 if x = y = 0
h3
0
f (h, 0) f (0, 0) h2
@ x f (0, 0) = lim = lim =1
h!0 h h!0 h
l h3
0
f (0, h) f (0, 0) h2
@ y f (0, 0) = lim = lim =1
h!0 h h!0 h
(hv1 )3 + (hv2 )3
3 3
f (hv1 , hv2 ) f (0, 0) (hv1 )2 + (hv2 )2 v1 + v2
@ v f (0, 0) = lim = lim = 2
h!0 h h!0 h v1 + v22
This directional derivative is generally not equal to the linear combination v1 @ x f (0, 0) + v2 @ y f (0, 0) =
v1 + v2 . The reason for this is that while each partial derivative exists, these functions are not continuous
themselves. Thus partial differentiability is not always sufficiently convenient. The following definition
generalizes differentiability for multivariate functions.
6
Definition 6. The continuous function f : Rn ! R is differentiable at x~ if all the partial derivatives at x~
exist and
x + ~h)
f (~ f (~x) x ) · ~h
r f (~
lim =0
~h!0 ||~h||
where r f (~
x ) = [@1 f (~
x ), . . . , @n f (~
x )] is the gradient of f at x~ .
This notion of differentiability can be used as a local linear approximation as well: for any " > 0, there
are small enough values of ~h such that
x + ~h) 2 f (~
f (~ x ) · ~h
x ) + r f (~ "||~h||, f (~ x ) · ~h + "||~h||
x ) + r f (~
Proposition 9. Let f : Rn ! R be a continuous function such that there is an open neighborhood around
x where each partial derivative at x is continuous on this neighborhood. Then f is differentiable at x.
A proof in R2 : We can write out the necessary limit directly from the definition of differentiability, and
write it in terms of partials by rearranging terms and making use of the Mean Value Theorem.
where x̂ 2 (x, x +h1 ) and ŷ 2 ( y, y +h2 ) from the Mean Value Theorem. Assuming the partial derivatives
are continuous sufficiently close to x, then @ x f (x̂, y+h2 ) ! @ x f (x, y), and @ y f (x, ŷ) ! @ y f (x, y). Thus
this total limit converges to 0, because h1 /||~h|| 1 and h2 /||~h|| 1, and therefore for any " > 0 we
can find sufficiently small values of h1 and h2 such that each term in this fraction is less than "/2. An
analogous proof can be applied for functions over Rn .