100% found this document useful (1 vote)
32 views17 pages

Curves in R: Graphs Vs Level Sets

The document discusses graphs and level sets of functions from Rn to Rm. It defines the derivative of such functions as a linear transformation whose matrix contains the partial derivatives. This linearization approximates the function locally. The gradient and Hessian of functions Rn → R are also introduced. The gradient gives the directional derivative, while the Hessian describes the local curvature of the function.

Uploaded by

Seshan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
32 views17 pages

Curves in R: Graphs Vs Level Sets

The document discusses graphs and level sets of functions from Rn to Rm. It defines the derivative of such functions as a linear transformation whose matrix contains the partial derivatives. This linearization approximates the function locally. The gradient and Hessian of functions Rn → R are also introduced. The gradient gives the directional derivative, while the Hessian describes the local curvature of the function.

Uploaded by

Seshan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Curves in R2: Graphs vs Level Sets

Graphs (y = f (x)): The graph of f : R → R is

{(x, y) ∈ R2 | y = f (x)}.
Example: When we say “the curve y = x2 ,” we really mean: “The graph of the function
f (x) = x2 .” That is, we mean the set {(x, y) ∈ R2 | y = x2 }.

Level Sets (F (x, y) = c): The level set of F : R2 → R at height c is

{(x, y) ∈ R2 | F (x, y) = c}.


Example: When we say “the curve x2 + y 2 = 1,” we really mean: “The level set of the
function F (x, y) = x2 + y 2 at height 1.” That is, we mean the set {(x, y) ∈ R2 | x2 + y 2 = 1}.

Note: Every graph is a level set (why?). But not every level set is a graph.
Graphs must pass the vertical line test. (Level sets may or may not.)

Surfaces in R3: Graphs vs Level Sets


Graphs (z = f (x, y)): The graph of f : R2 → R is

{(x, y, z) ∈ R3 | z = f (x, y)}.


Example: When we say “the surface z = x2 + y 2 ,” we really mean: “The graph of the func-
tion f (x, y) = x2 + y 2 .” That is, we mean the set {(x, y, z) ∈ R3 | z = x2 + y 2 }.

Level Sets (F (x, y, z) = c): The level set of F : R3 → R at height c is

{(x, y, z) ∈ R3 | F (x, y, z) = c}.


Example: When we say “the surface x2 + y 2 + z 2 = 1,” we really mean: “The level set of the
function F (x, y, z) = x2 + y 2 + z 2 at height 1.” That is, {(x, y, z) ∈ R3 | x2 + y 2 + z 2 = 1}.

Again: Every graph is a level set (why?). But not every level set is a graph.
Graphs must pass the vertical line test. (Level sets may or may not.)

Q: Do you see the patterns here? For example, suppose G : R7 → R.


◦ What does the graph of G look like? (A: 7-dimensional object in R8 .
That is, {(x1 , . . . , x7 , x8 ) ∈ R8 | x8 = G(x1 , . . . , x7 )}.)
◦ What do the level sets of G look like? (A: They are (generically) 6-dim
objects in R7 . That is, {(x1 , . . . , x7 ) ∈ R7 | G(x1 , . . . , x7 ) = c}.)
Curves in R2: Examples of Graphs
2

{(x, y) ∈ R y = ax + b}: Line (not vertical)
{(x, y) ∈ R2 y = ax2 + bx + c}: Parabola

Curves in R2: Examples of Level Sets



2
{(x, y) ∈ R ax + by = c}:Line
 x2 y 2
(x, y) ∈ R2 2 + 2 = 1 : Ellipse

a b
 x2 y 2 
(x, y) ∈ R2 2 − 2 = 1 : Hyperbola

a b
 x2 y 2 
(x, y) ∈ R2 − 2 + 2 = 1 : Hyperbola

a b

Surfaces in R3: Examples of Graphs


n o
3
(x, y, z) ∈ R z = ax + by : Plane (not containing a vertical line)
x2 y 2
 
3
(x, y, z) ∈ R z = 2 + 2 : Elliptic Paraboloid
a b 
2
y2

3
x
(x, y, z) ∈ R z = − 2 − 2 : Elliptic Paraboloid
a b
2 2
 x y
(x, y, z) ∈ R3 z = 2 − 2 : Hyperbolic Paraboloid (“saddle”)

a b 
2
y2

3
x
(x, y, z) ∈ R z = − 2 + 2 : Hyperbolic Paraboloid (“saddle”)
a b

Surfaces in R3: Examples of Level Sets


n o
3
(x, y, z) ∈ R ax + by + cz = d : Plane
 x2 y 2 z 2 
(x, y, z) ∈ R3 2 + 2 + 2 = 1 : Ellipsoid

a b c
 x2 y 2 z 2 
(x, y, z) ∈ R3 2 + 2 − 2 = 1 : Hyperboloid of 1 Sheet

a b c
 x2 y 2 z 2 
(x, y, z) ∈ R3 2 − 2 − 2 = 1 : Hyperboloid of 2 Sheets

a b c
Two Model Examples
Example 1A (Elliptic Paraboloid): Consider f : R2 → R given by

f (x, y) = x2 + y 2 .

The level sets of f are curves in R2 . Level sets are {(x, y) ∈ R2 x2 + y 2 = c}.
The graph of f is a surface in R3 . Graph is {(x, y, z) ∈ R3 z = x2 + y 2 }.

Notice that (0, 0, 0) is a local minimum of f .


∂2f ∂2f
Note that ∂f ∂f
∂x (0, 0) = ∂y (0, 0) = 0. Also, ∂x2 (0, 0) > 0 and ∂y 2 (0, 0) > 0.
Sketch the level sets of f and the graph of f :

Example 1B (Elliptic Paraboloid): Consider f : R2 → R given by

f (x, y) = −x2 − y 2 .

The level sets of f are curves in R . Level sets are {(x, y) ∈ R −x2 −y 2 = c}.
2 2

The graph of f is a surface in R3 . Graph is {(x, y, z) ∈ R3 z = −x2 − y 2 }.

Notice that (0, 0, 0) is a local maximum of f .


∂2f ∂2f
Note that ∂f
∂x (0, 0) = ∂f
∂y (0, 0) = 0. Also, ∂x2 (0, 0) < 0 and ∂y 2 (0, 0) < 0.
Sketch the level sets of f and the graph of f :

Example 2 (Hyperbolic Paraboloid): Consider f : R2 → R given by

f (x, y) = x2 − y 2 .

The level sets of f are curves in R2 . Level sets are {(x, y) ∈ R2 x2 − y 2 = c}.
The graph of f is a surface in R3 . Graph is {(x, y, z) ∈ R3 z = x2 − y 2 }.

Notice that (0, 0, 0) is a saddle point of the graph of f .


∂2f ∂2f
Note that ∂f
∂x (0, 0) = ∂f
∂y (0, 0) = 0. Also, ∂x2 (0, 0) > 0 while ∂y 2 (0, 0) < 0.
Sketch the level sets of f and the graph of f :
Introduction: Derivatives of Functions Rn → Rm
Def: Let F : Rn → Rm be a function, say

F (x1 , . . . , xn ) = (F1 (x1 , . . . , xn ), . . . , Fm (x1 , . . . , xn )).

Its derivative at the point (x1 , . . . , xn ) is the linear transformation


DF (x1 , . . . , xn ) : Rn → Rm whose (m × n) matrix is
 ∂F ∂F1

∂x
1
· · · ∂xn
 .. 1 .. .
DF (x1 , . . . , xn ) =  . . 
∂Fm ∂Fm
∂x1 · · · ∂xn

Note: The columns are the partial derivatives with respect to x1 , then to
x2 , etc. The rows are the gradients of the component functions F 1 , F 2 , etc.
∇F1
   
∂F
DF (x1 , . . . , xn ) =  ∂x ∂F 
· · · ∂x = .. .
1 n
.
∇Fm

Example: Let F : R2 → R3 be the function

F (x, y) = (x + 2y, sin(x), ey ) = (F1 (x, y), F2 (x, y), F3 (x, y)).

Its derivative at (x, y) is a linear transformation DF (x, y) : R2 → R3 . The


matrix of the linear transformation DF (x, y) is:
 ∂F ∂F  
1 1

∂x ∂y 1 2
DF (x, y) =  ∂F
 2 ∂F2  
∂y  = cos(x) 0 .

∂x
∂F3 ∂F3
∂x ∂y
0 ey

Notice that (for instance) DF (1, 1) is a linear transformation, as is DF (2, 3),


etc. That is, each DF (x, y) is a linear transformation R2 → R3 .

Goals: We will:
◦ Interpret the derivative of F as the “best linear approximation to F .”
◦ State a Chain Rule for multivariable derivatives.
Introduction: Gradient of Functions Rn → R
Def: Let f : Rn → R be a function.
Recall: The derivative of f : Rn → R at the point x = (x1 , . . . , xn ) is the
1 × n matrix h i
∂f ∂f
Df (x) = ∂x1 · · · ∂xn .
The gradient of f : Rn → R at the point x = (x1 , . . . , xn ) is the vector
h i
∂f ∂f
∇f (x) = ∂x1 , . . . , ∂xn .

The directional derivative of f : Rn → R at the point x = (x1 , . . . , xn )


in the direction v ∈ Rn is the dot product of the vectors ∇f (x) and v:

Dv f (x) = ∇f (x) · v.

We will give geometric interpretations of these concepts later in the course.

Introduction: Hessian of Functions Rn → R


Theorem: Let f : Rn → R be a function whose second partial derivatives
are all continuous. Then:
∂ 2f ∂ 2f
= .
∂xi ∂xj ∂xj ∂xi
In brief: “Second partials commute.”

Def: Let f : Rn → R be a function.


The Hessian of f : Rn → R at the point x = (x1 , . . . , xn ) is the n × n
matrix  ∂2f ∂2f

∂x2 · · · ∂x1 ∂xn
 .. 1 .. .
Hf (x) =  . . 
∂2f ∂2f
∂xn ∂x1 ··· ∂x2n
The directional second derivative of f : Rn → R at the point x =
(x1 , . . . , xn ) in the direction v ∈ Rn is

vT Hf (x) v.

Again, we will give geometric interpretations of these concepts later on.


Linear Approximation: Single-Variable Calculus
Review: In single-variable calc, we look at functions f : R → R. We write
y = f (x), and at a point (a, f (a)) write:

∆y ≈ dy.

Here, ∆y = f (x) − f (a), while dy = f 0 (a)∆x = f 0 (a)(x − a). So:

f (x) − f (a) ≈ f 0 (a)(x − a).

Therefore:
f (x) ≈ f (a) + f 0 (a)(x − a).

The right-hand side f (a) + f 0 (a)(x − a) can be interpreted as follows:


◦ It is the best linear approximation to f (x) at x = a.
◦ It is the 1st Taylor polynomial to f (x) at x = a.
◦ The line y = f (a) + f 0 (a)(x − a) is the tangent line at (a, f (a)).

Linear Approximation: Multivariable Calculus


Now consider functions f : Rn → Rm . At a point (a, f (a)), we have exactly
the same thing:
f (x) − f (a) ≈ Df (a)(x − a).
That is:
f (x) ≈ f (a) + Df (a)(x − a). (∗)
Note: The object Df (a) is a matrix, while (x − a) is a vector. That is,
Df (a)(x − a) is matrix-vector multiplication.

Example: Let f : R2 → R. Let’s write x = (x1 , x2 ) and a = (a1 , a2 ). Then


(∗) reads:
h i x − a 
∂f ∂f 1 1
f (x1 , x2 ) ≈ f (a1 , a2 ) + ∂x1 (a1 , a2 ) ∂x2 (a1 , a2 )
x2 − a2
∂f ∂f
= f (a1 , a2 ) + (a1 , a2 )(x1 − a1 ) + (a1 , a2 )(x2 − a2 ).
∂x1 ∂x2
Review: Taylor Polynomials in Single-Variable Calculus
Review: In single-variable calculus, we look at functions f : R → R.
At a point a ∈ R, the linear approximation (1st-deg Taylor polynomial)
to f is:
f (x) ≈ f (a) + f 0 (a)(x − a).
More accurate is the quadratic approximation (2nd-deg Taylor polynomial)
1
f (x) ≈ f (a) + f 0 (a)(x − a) + f 00 (a)(x − a)2 .
2!
We would like to have similar ideas for multivariable functions.

Linear Approximation: 1st-Deg Taylor Polynomials


Let f : Rn → Rm . The linear approximation of f at the point a ∈ Rn
is:
f (x) ≈ f (a) + Df (a)(x − a). (1)
Note that Df (a) is a matrix, while (x − a) is a vector. That is, Df (a)(x − a)
is matrix-vector multiplication.
Note that (1) is the best linear approximation to f (x) for points x near a.
It is the 1st-degree Taylor polynomial to f (x) at a.

Quadratic Approximation: 2nd-Deg Taylor Polynomials


Let f : Rn → R. The quadratic approximation of f at the point a ∈ Rn
is:
1
f (x) ≈ f (a) + Df (a)(x − a) + (x − a)T Hf (a)(x − a). (2)
2!
Note that (x − a)T is a row vector, while (x − a) is a column vector, while
Hf (a) is a matrix. So, 2!1 (x − a)T Hf (a)(x − a) is of the form vT Av.
Note that (2) is the best quadratic approximation to f (x) for points x
near a. It is the 2nd-degree Taylor polynomial to f (x) at a.

Example: Let f : R2 → R be f (x, y) = x3 sin(y). For (x, y) near a = (2, π2 ):


   
π π x−2 1 π
 π x−2
f (x, y) ≈ f (2, 2 ) + Df (2, 2 ) + x − 2 y − 2 Hf (2, 2 )
y − π2 2! y − π2
    
  x−2 1 π
 12 0 x−2
= 8 + 12 0 + x−2 y− 2
y − π2 2 0 −8 y − π2
= 8 + 12(x − 2) + 6(x − 2)2 − 4(y − π2 )2 .
Tangent Lines/Planes to Graphs
Fact: Suppose a curve in R2 is given as a graph y = f (x). The equation of
the tangent line at (a, f (a)) is:
y = f (a) + f 0 (a)(x − a).
Okay, you knew this from single-variable calculus. How does the multivari-
able case work? Well:

Fact: Suppose a surface in R3 is given as a graph z = f (x, y). The equation


of the tangent plane at (a, b, f (a, b)) is:
∂f ∂f
z = f (a, b) + (a, b)(x − a) + (a, b)(y − b).
∂x ∂y
Notice the similarity between this and the linear approximation to f at (a, b).

Tangent Lines/Planes to Level Sets


Def: For a function F : Rn → R, its gradient is the vector in Rn given by:
 
∂F ∂F ∂F
∇F = , , ..., .
∂x1 ∂x2 ∂xn

Theorem: Consider a level set F (x1 , . . . , xn ) = c of a function F : Rn → R.


If (a1 , . . . , an ) is a point on the level set, then ∇F (a1 , . . . , an ) is normal to
the level set.

Corollary 1: Suppose a curve in R2 is given as a level curve F (x, y) = c.


The equation of the tangent line at a point (x0 , y0 ) on the level curve is:
∂F ∂F
(x0 , y0 )(x − x0 ) + (x0 , y0 )(y − y0 ) = 0.
∂x ∂y
Corollary 2: Suppose a surface in R3 is given as a level surface F (x, y, z) = c.
The equation of the tangent plane at a point (x0 , y0 , z0 ) on the level surface
is:
∂F ∂F ∂F
(x0 , y0 , z0 )(x − x0 ) + (x0 , y0 , z0 )(y − y0 ) + (x0 , y0 , z0 )(z − z0 ) = 0.
∂x ∂y ∂z
Q: Do you see why Cor 1 and Cor 2 follow from the Theorem?
Composition and Matrix Multiplication
Recall: Let f : X → Y and g : Y → Z be functions. Their composition is
the function g ◦ f : X → Z defined by

(g ◦ f )(x) = g(f (x)).

Observations:
(1) For this to make sense, we must have: co-domain(f ) = domain(g).
(2) Composition is not generally commutative: that is, f ◦ g and g ◦ f are
usually different.
(3) Composition is always associative: (h ◦ g) ◦ f = h ◦ (g ◦ f ).

Fact: If T : Rk → Rn and S : Rn → Rm are both linear transformations, then


S ◦ T is also a linear transformation.

Question: How can we describe the matrix of the linear transformation S ◦T


in terms of the matrices of S and T ?

Fact: Let T : Rn → Rn and S : Rn → Rm be linear transformations with


matrices B and A, respectively. Then the matrix of S ◦ T is the product AB.

We can multiply an m × n matrix A by an n × k matrix B. The result,


AB, will be an m × k matrix:

(m × n)(n × k) → (m × k).

Notice that n appears twice here to “cancel out.” That is, we need the number
of rows of A to equal the number of columns of B – otherwise, the product
AB makes no sense.

Example 1: Let A be a (3 × 2)-matrix, and let B be a (2 × 4)-matrix. The


product AB is then a (3 × 4)-matrix.

Example 2: Let A be a (2 × 3)-matrix, and let B be a (4 × 2)-matrix. Then


AB is not defined. (But the product BA is defined: it is a (4 × 3)-matrix.)
Chain Rule
Chain Rule (Matrix Form): Let f : Rn → Rm and g : Rm → Rp be any
differentiable functions. Then

D(g ◦ f )(x) = Dg(f (x)) · Df (x).

Here, the product on the right-hand side is a product of matrices.

Many texts describe the chain rule in the following more classical form.
While there is a “classical” form in the general case of functions g : Rm → Rp ,
we will keep things simple and only state the case of functions g : Rm → R
with codomain R.

Chain Rule (Classical Form): Let g = g(x1 , . . . , xm ) and suppose each


x1 , . . . , xm is a function of the variables t1 , . . . , tn . Then:
∂g ∂g ∂x1 ∂g ∂x2 ∂g ∂xm
= + + ··· + ,
∂t1 ∂x1 ∂t1 ∂x2 ∂t1 ∂xm ∂t1
..
.
∂g ∂g ∂x1 ∂g ∂x2 ∂g ∂xm
= + + ··· + .
∂tn ∂x1 ∂tn ∂x2 ∂t1 ∂xm ∂tn

Example 1: Let z = g(u, v), where u = h(x, y) and v = k(x, y). Then the
chain rule reads:
∂z ∂z ∂u ∂z ∂v
= +
∂x ∂u ∂x ∂v ∂x
and
∂z ∂z ∂u ∂z ∂v
= + .
∂y ∂u ∂y ∂v ∂y

Example 2: Let z = g(u, v, w), where u = h(t), v = k(t), w = `(t). Then


the chain rule reads:
∂z ∂z ∂u ∂z ∂v ∂z ∂w
= + + .
∂t ∂u ∂t ∂v ∂t ∂w ∂t
Since u, v, w are functions of just a single variable t, we can also write this
formula as:
∂z ∂z du ∂z dv ∂z dw
= + + .
∂t ∂u dt ∂v dt ∂w dt
For Clarification: The Two Forms of the Chain Rule
Q: How exactly are the two forms of the chain rule the same?
A: If we completely expand the matrix form, writing out everything in compo-
nents, we end up with the classical form. The following examples may clarify.

Example 1: Suppose z = g(u, v), where u = h(x, y) and v = k(x, y).


This setup means we essentially have two functions:
g : R2 → R and f : R2 → R2
g(u, v) = z f (x, y) = (h(x, y), k(x, y)) = (u, v).
The matrix form of the Chain Rule reads:
D(g ◦ f )(x, y) = Dg(f (x, y)) · Df (x, y)
" #
h i  ∂h ∂h
∂(g◦f ) ∂(g◦f ) ∂g ∂g
 ∂x ∂y
∂x , ∂y
= ∂u , ∂v ∂k ∂k
∂x ∂y
h i h i
∂z ∂z ∂g ∂h ∂g ∂k ∂g ∂h ∂g ∂k
∂x , ∂y = ∂u ∂x + ∂v ∂x , ∂u ∂y + ∂v ∂y

Setting components equal to each other, we conclude that


∂z ∂g ∂h ∂g ∂k
= +
∂x ∂u ∂x ∂v ∂x
and
∂z ∂g ∂h ∂g ∂k
= + .
∂y ∂u ∂y ∂v ∂y
This is exactly the classical form of the Chain Rule. 

Example 2: Suppose z = g(u, v, w), where u = h(t), v = k(t), w = `(t).


This setup means we essentially have two functions:
g : R3 → R and f : R → R3
g(u, v, w) = z f (t) = (h(t), k(t), `(t)) = (u, v, w).
The matrix form of the Chain Rule reads:
D(g ◦ f )(t) = Dg(f (t)) · Df (t)
 ∂h 
h
∂(g◦f )
i 
∂g ∂g
 ∂t
∂g  ∂k 
∂t
= ∂u , ∂v , ∂w ∂t
∂`
∂t
∂z ∂g ∂h ∂g ∂k ∂g ∂`
= + + .
∂t ∂u ∂t ∂v ∂t ∂w ∂t
Again, we recovered the classical form of the Chain Rule. 
Inverses: Abstract Theory
Def: A function f : X → Y is invertible if there is a function f −1 : Y → X
satisfying:
f −1 (f (x)) = x, for all x ∈ X, and
f (f −1 (y)) = y, for all y ∈ Y.
In such a case, f −1 is called an inverse function for f .

In other words, the function f −1 “undoes” the function f . For example,



an inverse function of f : R → R, f (x) = x3 is f −1 : R → R, f −1 (x) = 3 x.
An inverse of g : R → (0, ∞), g(x) = 2x is g −1 : (0, ∞) → R, g −1 (x) = log2 (x).

Whenever a new concept is defined, a mathematician asks two questions:

(1) Uniqueness: Are inverses unique? That is, must a function f have at
most one inverse f −1 , or is it possible for f to have several different inverses?
Answer: Yes.

Prop 16.1: If f : X → Y is invertible (that is, f has an inverse), then the


inverse function f −1 is unique (that is, there is only one inverse function).

(2) Existence: Do inverses always exist? That is, does every function f
have an inverse function f −1 ?
Answer: No. Some functions have inverses, but others don’t.

New question: Which functions have inverses?

Prop 16.3: A function f : X → Y is invertible if and only if f is both “one-


to-one” and “onto.”
Despite their fundamental importance, there’s no time to talk about “one-
to-one” and “onto,” so you don’t have to learn these terms. This is sad :-(

Question: If inverse functions “undo” our original functions, can they help
us solve equations? Yes! That’s the entire point:

Prop 16.2: A function f : X → Y is invertible if and only if for every b ∈ Y ,


the equation f (x) = b has exactly one solution x ∈ X.
In this case, the solution to the equation f (x) = b is given by x = f −1 (b).
Inverses of Linear Transformations
Question: Which linear transformations T : Rn → Rm are invertible? (Equiv:
Which m × n matrices A are invertible?)

Fact: If T : Rn → Rm is invertible, then m = n.


So: If an m × n matrix A is invertible, then m = n.

In other words, non-square matrices are never invertible. But square ma-
trices may or may not be invertible. Which ones are invertible? Well:

Theorem: Let A be an n × n matrix. The following are equivalent:


(i) A is invertible
(ii) N (A) = {0}
(iii) C(A) = Rn
(iv) rref(A) = In
(v) det(A) 6= 0.

To Repeat: An n × n matrix A is invertible if and only if for every b ∈ Rn ,


the equation Ax = b has exactly one solution x ∈ Rn .
In this case, the solution to the equation Ax = b is given by x = A−1 b.

Q: How can we find inverse matrices? This is accomplished via:

Prop 16.7: If A is an invertible matrix, then rref[A | In ] = [In | A−1 ].


 
a b
Useful Formula: Let A=
c d
be a 2×2 matrix. If A is invertible (det(A) =
ad − bc 6= 0), then:  
−1 1 d −b
A = .
ad − bc −c a

Prop 16.8: Let f : X → Y and g : Y → Z be invertible functions. Then:


(a) f −1 is invertible and (f −1 )−1 = f .
(b) g ◦ f is invertible and (g ◦ f )−1 = f −1 ◦ g −1 .

Corollary: Let A, B be invertible n × n matrices. Then:


(a) A−1 is invertible and (A−1 )−1 = A.
(b) AB is invertible and (AB)−1 = B −1 A−1 .
The Gradient: Two Interpretations
Recall: For a function F : Rn → R, its gradient is the vector in Rn given
by:  
∂F ∂F ∂F
∇F = , , ..., .
∂x1 ∂x2 ∂xn
There are two ways to think about the gradient. They are interrelated.

Gradient: Normal to Level Sets


Theorem: Consider a level set F (x1 , . . . , xn ) = c of a function F : Rn → R.
If (a1 , . . . , an ) is a point on the level set, then ∇F (a1 , . . . , an ) is normal to
the level set.

Example: If we have a level curve F (x, y) = c in R2 , the gradient vector


∇F (x0 , y0 ) is a normal vector to the level curve at the point (x0 , y0 ).

Example: If we have a level surface F (x, y, z) = c in R3 , the gradient vector


∇F (x0 , y0 , z0 ) is a normal vector to the level surface at the point (x0 , y0 , z0 ).

Normal vectors help us find tangent planes to level sets (see the handout
“Tangent Lines/Planes...”) But there’s another reason we like normal vectors.

Gradient: Direction of Steepest Ascent for Graphs


Observation: A normal vector to a level set F (x1 , . . . , xn ) = c in Rn is the
direction of steepest ascent for the graph z = F (x1 , . . . , xn ) in Rn+1 .

Example (Elliptic Paraboloid): Let f : R2 → R be f (x, y) = 2x2 + 3y 2 .


The level sets of f are the ellipses 2x2 + 3y 2 = c in R2 .
The graph of f is the elliptic paraboloid z = 2x2 + 3y 2 in R3 . 
4
At the point (1, 1) ∈ R2 , the gradient vector ∇f (1, 1) = is normal to
6
the level curve 2x2 +3y 2 = 5. So, if we were hiking on the surface z = 2x2 +3y 2
in R3 and were at the point (1, 1, f (1, 1)) = (1,1, 5), to ascend the surface
4
the fastest, we would hike in the direction of . 
6
Warning: Note that ∇f is normal to the level sets of f . It is not a normal
vector to the graph of f .
Directional Derivatives
Def: For a function f : Rn → R, its directional derivative at the point
x ∈ Rn in the direction v ∈ Rn is:
Dv f (x) = ∇f (x) · v.
Here, · is the dot product of vectors. Therefore,
Dv f (x) = k∇f (x)kkvk cos θ, where θ = ](∇f (x), v).
Usually, we assume that v is a unit vector, meaning kvk = 1.
 
a
Example: Let f : R2 → R. Let v = . Then:
b
  " ∂f #  
Dv f (x, y) = ∇f (x, y) ·
a
= ∂f∂x · a = a ∂f + b ∂f .
b ∂y b ∂x ∂y

In particular, we have two important special cases:


 
1 ∂f
De1 f (x, y) = ∇f (x, y) · =
0 ∂x
 
0 ∂f
De2 f (x, y) = ∇f (x, y) · = .
1 ∂y
Point: Partial derivatives are themselves examples of directional derivatives!

Namely, ∂f
∂x is the directional derivative of f in the e1 -direction, while
∂f
∂y
is the directional derivative in the e2 -direction.

Question: At a point a, in which direction v will the function f grow the


most? i.e.: At a given point a, for which unit vector v is Dv f (a) maximized?

Theorem 6.3: Fix a point a ∈ Rn .


(a) The directional derivative Dv f (a) is maximized when v points in the
same direction as ∇f (a).
(b) The directional derivative Dv f (a) is minimized when v points in the
opposite direction as ∇f (a).

In fact: The maximum and minimum values of Dv f (a) at the point a ∈ Rn


are k∇f (a)k and −k∇f (a)k. (Assuming we only care about unit vectors v.)
Determinants
There are two reasons why determinants are important:
(1) Algebra: Determinants tell us whether a matrix is invertible or not.
(2) Geometry: Determinants are related to area and volume.

Determinants: Algebra
Prop 17.3: An n × n matrix A is invertible ⇐⇒ det(A) 6= 0.
Moreover: if A is invertible, then
1
det(A−1 ) = .
det(A)
Properties of Determinants (17.2, 17.4):
(1) (Multiplicativity) det(AB) = det(A) det(B).
(2) (Alternation) Exchanging two rows of a matrix reverses the sign of the
determinant.
(3) (Multilinearity): First:
     
a1 a2 · · · an b1 b2 · · · bn a1 + b1 a2 + b2 · · · an + bn
 c21 c22 · · · c2n   c21 c22 · · · c2n   c21 c22 ··· c2n 
det . + det = det
     
.. .. .. .. .. . . .. .. .. .. .. 
 ..
   
. . .   . . . .   . . . . 
cn1 cn2 · · · cnn cn1 cn2 · · · cnn cn1 cn2 ··· cnn
and similarly for the other rows; Second:
   
ka11 ka12 · · · ka1n a11 a12 · · · a1n
 a21 a22 · · · a2n   a21 a22 · · · a2n 
det . = k det
   
. .
. . . .
.   .. .. . . .. 
 . . . .   . . . . 
an1 an2 · · · ann an1 an2 · · · ann
and similarly for the other rows. Here, k ∈ R is any scalar.

Warning! Multilinearity does not say that det(A + B) = det(A) + det(B).


It also does not say det(kA) = k det(A). But: det(kA) = k n det(A) is true.

Determinants: Geometry
Prop 17.5: Let A be any 2 × 2 matrix. Then the area of the parallelogram
generated by the columns of A is |det(A)|.

Prop 17.6: Let T : R2 → R2 be a linear transformation with matrix A. Let


R be a region in R2 . Then:
Area(T (R)) = |det(A)| · Area(R).
Coordinate Systems
Def: Let V be a k-dim subspace of Rn . Each basis B = {v1 , . . . , vk } deter-
mines a coordinate system on V .
That is: Every vector v ∈ V can be written uniquely as a linear combina-
tion of the basis vectors:

v = c1 v1 + · · · + ck vk .

We then call c1 , . . . , ck the coordinates of v with respect to the basis B. We


then write  
c1
c 
 2
[v]B =  .. .
.
ck
Note that [v]B has k components, even though v ∈ Rn .

Note: Levandosky (L21: p 145-149) explains all this very clearly, in much
more depth than this review sheet provides. The examples are also quite
good: make sure you understand all of them.

Def: Let B = {v1 , . . . , vk } be a basis for a k-dim subspace V of Rn . The


change-of-basis matrix for the basis B is:
 

C = v1 v2 · · · vk .

Every vector v ∈ V in the subspace V can be written

v = c1 v1 + · · · + ck vk .

In other words:
v = C[v]B .
This formula tells us how to go between the standard coordinates for v and
the B-coordinates of v.

Special Case: If V = Rn and B is a basis of Rn , then the matrix C will be


invertible, and therefore:
[v]B = C −1 v.

You might also like