Partial Derivatives

Functions of two variables
Examples: Functions of several variables

f (x, y) = x2 + y 2 ⇒ f (1, 2) = 5 etc.
f (x, y) = xy 2 ex+y
f (x, y, z) = xy log z
Ideal gas law: P = kT /V .
Dependent and independent variables
In z = f (x, y) we say x, y are independent variables and z is a dependent variable. This
indicates that x and y are free to take any values and then z depends on these values. For
now it will be clear which are which, later we’ll have to take more care.
Graphs
For the function y = f (x): there is one independent variable and one dependent variable,
which means we need 2 dimensions for its graph.
Graphing technique:
go to x then compute y = f (x) then go up to height y.
For z = f (x, y) we have two independent and one dependent variable, so we need 3 dimen
sions to graph the function. The technique is the same as before.
Example: Consider z = f (x, y) = x2 + y 2 .
To make the graph:
go to (x, y) then compute z = f (x, y) then go up to height z.
√
We show the plot of three points: f (0, 0) = 0, f (1, 1) = 2 and f (0, 2) = 2.
z�
... . . . . . . . . ... . ..
.. . ... .
• •
• •
√ y�
• 2
x� (1,1)
The figure above shows more than just the graph of three points. Here are the steps we
used to draw the graph. Remember, this is just a sketch, it should suggest the shape of the
graph and some of its features.
1. First we draw the axes. The z-axis points up, the y-axis is to the right and the x-axis
comes out of the page, so it is drawn at the angle shown. This gives a perspective with the
eye somewhere in the first octant.
2. The yz-traces are those curves found by setting x = a constant. We start with the trace
when x = 0. This is an upward pointing parabola in the yz-plane.
√
3. Next we sketch the trace with z = 3. This is a circle of radius 3 at height z = 3. Note,
the traces where z = constant are generally called level curves.
This is enough for this graph. Other graphs take other traces. You should expect to do a
certain amount of trial and error before your figure looks right.
MIT OpenCourseWare
http://ocw.mit.edu
18.02SC Multivariable Calculus

Fall 2010 ��
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Gallery of graphs
Hyperboloid of one sheet:

x2 y 2 z 2 x2 y 2 z 2
Ellipsoid: + 2 + 2 =1 + 2 − 2 =1
a2 b c a2 b c
Hyperboloid of two sheets:

x2 y 2 z2
x2 y 2 z 2 Elliptic cone: + =
+ 2 − 2 = −1 a2 b2 c2
a2 b c
Elliptic paraboloid: z = ax2 + by 2 Hyperbolic paraboloid: z = by 2 − ax2
MIT OpenCourseWare

Fall 2010 ��
Level Curves and Contour Plots
Level curves and contour plots are another way of visualizing functions of two variables. If
you have seen a topographic map then you have seen a contour plot.
Example: To illustrate this we first draw the graph of z = x2 + y 2 . On this graph we draw
contours, which are curves at a fixed height z = constant.
For example the curve at height z = 1 is the circle x2 + y 2 = 1. On the graph we have
to draw this at the correct height. Another way to show this is to draw the curves in the
xy-plane and label them with their z-value. We call these curves level curves and the entire
plot is called a contour plot.
For this example they are shown in the plot on the right. Notice that the 3D graph is simply
the level curves ’pulled out’ each to its correct height.
z
y
• z=4
.. .. ... . ... ... . .. z=2
.. . ... .
• •z = 1 z=1 �
• x
�• • y�
�� √ • √1
x�� (1/ 2,1/ 2)
Here is another plot of a ’mountain pass’. Notice that in the contour plot the mountain
pass is represented by a level curve that crosses itself. Moving up or down from the cross
level curves heights decrease and moving right or left in the other they increase.
z = 400
................... . . . ........................ z = 600
...... . .. . . . z = 800
.... ... ........ .. ........... . ........... .. ... ............
.. ..
..... . .. ......... .. .. ......... ...... .. .. ... ...... .. . . ........ z = 1000
.... .
Mountain pass
Level curves
MIT OpenCourseWare

Fall 2010
Partial derivatives
Partial derivatives
Let w = f (x, y) be a function of two variables. Its graph is a surface in xyz-space, as
w
pictured. w=f(x,y)
Fix a value y = y0 and just let x vary. You get a function of one variable, w=f(x,y 0)
(1) w = f (x, y0 ), the partial function for y = y0 .
P
Its graph is a curve in the vertical plane y = y0 , whose slope at the
point P where x = x0 is given by the derivative
y0 y
� � x0
d � ∂f ��
(2) f (x, y0 )�� , or . x
dx x0 ∂x �(x0 ,y0 )
We call (2) the partial derivative of f with respect to x at the point (x0 , y0 ); the right side
of (2) is the standard notation for it. The partial derivative is just the ordinary derivative
of the partial function — it is calculated by holding one variable fixed and differentiating
with respect to the other variable. Other notations for this partial derivative are
� � � � �
∂w �� ∂f ∂w
fx (x0 , y0 ), , , ;
∂x �(x0 ,y0 ) ∂x 0 ∂x 0
the first is convenient for including the specific point; the second is common in science
and engineering, where you are just dealing with relations between variables and don’t
mention the function explicitly; the third and fourth indicate the point by just using a
single subscript.
Analogously, fixing x = x0 and letting y vary, we get the partial function w = f (x0 , y),
whose graph lies in the vertical plane x = x0 , and whose slope at P is the partial derivative
of f with respect to y; the notations are
� � � � � �
∂f �� ∂w �� ∂f ∂w
, f y (x 0 , y 0 ), , , .
∂y �(x0 ,y0 ) ∂y �(x0 ,y0 ) ∂y 0 ∂y 0
The partial derivatives ∂f /∂x and ∂f /∂y depend on (x0 , y0 ) and are therefore functions of
x and y.
Written as ∂w/∂x, the partial derivative gives the rate of change of w with respect to x
alone, at the point (x0 , y0 ): it tells how fast w is increasing as x increases, when y is held
constant.
For a function of three or more variables, w = f (x, y, z, . . . ), we cannot draw graphs any
more, but the idea behind partial differentiation remains the same: to define the partial
derivative with respect to x, for instance, hold all the other variables constant and take the
ordinary derivative with respect to x; the notations are the same as above:
� � � �
d ∂f ∂w
f (x, y0 , z0 , . . . ) = fx (x0 , y0 , z0 , . . . ), , .
dx ∂x 0 ∂x 0
1
MIT OpenCourseWare

Fall 2010
The Tangent Approximation
1. The tangent plane.

For a function of one variable, w = f (x), the tangent line to its graph( w
) w=f(x,y)
dw
at a point (x0 , w0 ) is the line passing through (x0 , w0 ) and having slope . w=f(x,y 0)
dx 0
For a function of two variables, w = f (x, y), the natural analogue
is the tangent plane to the graph, at a point P
(x0 , y0 , w0 ). What’s the equation of this tangent plane? Re

ferring to the picture at right (this figure was also used when we y0 y
introduced partial derivatives), we see that the tangent plane x0
x
(i) must pass through (x0 , y0 , w0 ), where w0 = f (x0 , y0 );

(ii) must contain the tangent lines to the graphs of the two partial functions — this will
hold if the plane has the same slopes in the i and j directions as the surface does.
Using these two conditions, it is easy to find the equation of the tangent plane. The
general equation of a plane through (x0 , y0 , w0 ) is
A(x − x0 ) + B(y − y0 ) + C(w − w0 ) = 0 .
Assume the plane is not vertical; then C = 0, so we can divide through by C and solve for
w − w0 , getting
(3) w − w0 = a(x − x0 ) + b(y − y0 ), a = A/C, b = B/C.
The plane passes through (x0 , y0 , w0 ); what values of the coefficients a and b will make it
also tangent to the graph there? We have
a = slope of plane (3) in the i -direction (by putting y = y0 in (3));

= slope of graph in the i -direction, (by (ii) above)
( )
∂w
= ; (by the definition of partial derivative); similarly,
∂x 0
( )
∂w
b = .
∂y 0
Therefore the equation of the tangent plane to w = f (x, y) at (x0 , y0 ) is

( ) ( )
∂w ∂w
(4) w − w0 = (x − x0 ) + (y − y0 )
∂x 0 ∂y 0
2. The approximation formula.

The most important use for the tangent plane is to give an approximation that is the
basic formula in the study of functions of several variables — almost everything follows in
one way or another from it.
1
2 TANGENT APPROXIMATION
The intuitive idea is that if we stay near (x0 , y0 , w0 ), the graph of the tangent plane (4)
will be a good approximation to the graph of the function w = f (x, y). Therefore if the
point (x, y) is close to (x0 , y0 ),
( ) ( )
∂w ∂w
(5) f (x, y) ≈ w0 + (x − x0 ) + (y − y0 )
∂x 0 ∂y 0
height of graph ≈ height of tangent plane
The function on the right side of (5) whose graph is the tangent plane is often called the
linearization of f (x, y) at (x0 , y0 ): it is the linear function which gives the best approxima
tion to f (x, y) for values of (x, y) close to (x0 , y0 ).
An equivalent form of the approximation (5) is obtained by using Δ notation; if we put
Δx = x − x0 , Δy = y − y0 , Δw = w − w0 ,
then (5) becomes
( ) ( )
∂w ∂w
(6) Δw ≈ Δx + Δy, if Δx ≈ 0, Δy ≈ 0 .
∂x 0 ∂y 0
This formula gives the approximate change in w when we make a small change in x and y.
We will use it often.
The analogous approximation formula for a function w = f (x, y, z) of three variables
would be
( ) ( ) ( )
∂w ∂w ∂w
(7) Δw ≈ Δx + Δy + Δz, if Δx, ΔyΔz ≈ 0 .
∂x 0 ∂y 0 ∂z 0
Unfortunately, for functions of three or more variables, we can’t use a geometric argument
for the approximation formula (7); for this reason, it’s best to recast the argument for (6)
in a form which doesn’t use tangent planes and geometry, and therefore can be generalized
to several variables. This is done at the end of this Chapter TA; for now let’s just assume
the truth of (7) and its higher-dimensional analogues.
Here are two typical examples of the use of the approximation formula. Other examples
are in the Exercises. In the rest of your study of partial differentiation, you will see how the
approximation formula is used to derive the important theorems and formulas.
Example 1. Give a reasonable square, centered at (1, 1), over which the value of
w = x3 y 4 will not vary by more than ± .1 .
Solution. We use (6). We calculate for the two partial derivatives
wx = 3x2 y 4 wy = 4x3 y 3
and therefore, evaluating the partials at (1, 1) and using (6), we get
Δw ≈ 3Δx + 4Δy.
Thus if |Δx| ≤ .01 and Δy| ≤ .01, we should have
|Δw| ≤ 3|Δx| + 4|Δy| ≤ .07,
which is within the bounds. So the answer is the square with center at (1,1) given by
|x − 1| ≤ .01, |y − 1| ≤ .01 .
TANGENT APPROXIMATION 3
Example 2. The sides a, b, c of a rectangular box have lengths measured to be respec

tively 1, 2, and 3. To which of these measurements is the volume V most sensitive?
Solution. V = abc, and therefore by the approximation formula (7),
ΔV ≈ bc Δa + ac Δb + ab Δc
≈ 6 Δa + 3 Δb + 2 Δc, at (1, 2, 3);
thus it is most sensitive to small changes in side a, since Δa occurs with the largest coefficient.
(That is, if one at a time the measurement of each side were changed by say .01, it is the
change in a which would produce the biggest change in V , namely .06 .)
The result may seem paradoxical — the value of V is most sensitive to the
length of the shortest side — but it’s actually intuitive, as you can see by thinking
about how the box looks.
Sensitivity Principle The numerical value of w = f (x, y, . . . ), calculated at some
point (x0 , y0 , . . . ), will be most sensitive to small changes in that variable for which the
corresponding partial derivative wx , wy , . . . has the largest absolute value at the point.
MIT OpenCourseWare

Fall 2010
The Tangent approximation
4. Critique of the approximation formula.

First of all, the approximation formula for functions of two or three variables
� � � �
∂w ∂w
(6) Δw ≈ Δx + Δy, if Δx ≈ 0, Δy ≈ 0 .
∂x 0 ∂y 0
� � � � � �
∂w ∂w ∂w
(7) Δw ≈ Δx + Δy + Δz, if Δx, ΔyΔz ≈ 0 .
∂x 0 ∂y 0 ∂z 0
is not a precise mathematical statement, since the symbol ≈ does not specify exactly how
close the quantitites on either side of the formula are to each other. To fix this up, one
would have to specify the error in the approximation. (This can be done, but it is not often
used.)
A more fundamental objection is that our discussion of approximations was based on the
assumption that the tangent plane is a good approximation to the surface at (x0 , y0 , w0 ).
Is this really so?
Look at it this way. The tangent plane was determined as the plane which has the same
slope as the surface in the i and j directions. This means the approximation (6) will be
good if you move away from (x0 , y0 ) in the i direction (by taking Δy = 0), or in the j
direction (putting Δx = 0). But does the tangent plane have the same slope as the surface
in all the other directions as well?
Intuitively, we should expect that this will be so if the graph of f (x, y) is a “smooth”
surface at (x0 , y0 ) — it doesn’t have any sharp points, folds, or look peculiar. Here is the
mathematical hypothesis which guarantees this.
Smoothness hypothesis. We say f (x, y) is smooth at (x0 , y0 ) if
(8) fx and fy are continuous in some rectangle centered at (x0 , y0 ).
If (8) holds, the approximation formula (6) will be valid.

Though pathological examples can be constructed, in general the normal way a function
fails to be smooth (and in turn (6) fails to hold) is that one or both partial derivatives fail
to exist at (x0 , y0 ). This means of course that you won’t even be able to write the formula
(6), unless you’re sleepy. Here is a simple example.
�
Example 3. Where is w = x2 + y 2 smooth? Discuss.
Solution. Calculating formally, we get
∂w x ∂w y
= � , = � .
∂x x + y2
2 ∂y x + y2
2
These are continuous at all points except (0, 0), where they are undefined. So the function
is smooth except at the origin; the approximation formula (6) should be valid everywhere
except at the origin.
1
2 THE TANGENT APPROXIMATION
�
Indeed, investigating the graph of this function, since w = x2 + y 2 says that
height of graph over (x, y) = distance of (x, y) from w-axis,
the graph is a right circular cone, with vertex at (0, 0), axis along the w-axis, and vertex
angle a right angle. Geometrically the graph has a sharp point at the origin, so there should
be no tangent plane there, and no valid approximation formula (6) — there is no linear
function which approximates a cone at its vertex.
A non-geometrical argument for the approximation formula
We promised earlier a non-geometrical approach to the approximation formula (6) that
would generalize to higher-dimensions, in particular to the 3-variable formula (7). This
approach will also show why the hypothesis (8) of smoothness is needed. The argument is
still imprecise, since it uses the symbol ≈, but it can be refined to a proof (which you will
find in your book, though it’s not easy reading).
It uses the one-variable approximation formula for a differentiable function w = f (u) :
(9) Δw ≈ f ′ (u0 )Δu, if Δu ≈ 0 .
We wish to justify — without using reasoning based on 3-space — the approximation formula
y 0 +Δy
� � � �
∂w ∂w R
(6) Δw ≈ Δx + Δy, if Δx ≈ 0, Δy ≈ 0 .
∂x 0 ∂y 0 Δy
Δx
y0
P Q
We are trying to calculate the change in w as we go from P to R in the
picture, where P = (x0 , y0 ), R = (x0 + Δx, y0 + Δy). This change can be
x x0 + Δx
thought of as taking place in two steps: 0
(10) Δw = Δw1 + Δw2 ,

the first being the change in w as you move from P to Q, the second the change as you
move from Q to R. Using the one-variable approximation formula (9) :
�
d �
(11) Δw1 ≈ f (x, y0 )�� · Δx = fx (x0 , y0 ) Δx;
dx x0
similarly,
�
d �
Δw2 ≈ f (x0 + Δx, y)�� · Δy = fy (x0 + Δx, y0 ) Δy
dy y0
(12) ≈ fy (x0 , y0 ) Δy,
if we assume that fy is continuous (i.e., f is smooth), since the difference between the two
terms on the right in the last two lines will then be like ǫ Δy, which is negligible compared
with either term itself. Substituting the two approximate values (11) and (12) into (10)
gives us the approximation formula (6). �
To make this a proof, the error terms in the approximations have to be analyzed, or more
simply, one has to replace the ≈ symbol by equalities based on the Mean-Value Theorem of
one-variable calculus.
This argument readily generalizes to the higher-dimensional approximation formulas, such

as (7); again the essential hypothesis would be smoothness: the three partial derivatives
wx , wy , wz should be continuous in a neighborhood of the point (x0 , y0 , z0 ).
MIT OpenCourseWare

Fall 2010
Tangent approximation
1. a) Find the equation tangent plane to the graph of z = x2 + y 2 at the point (2,1,5).
b) Give the tangent approximation for z near the point (x0 , y0 ) = (2, 1).
∂z ∂z ∂z ∂z
Answer: a) = 2x and = 2y ⇒ (2, 1) = 4 and (2, 1) = 2.
∂x ∂y ∂x ∂y
The tangent plane at (2,1,5) is
� �
∂z �� ∂z ��
(z − 5) = (x − 2) + (y − 1) = 4(x − 2) + 2(y − 1).
∂x �0 ∂y �0
b) The tangent approximation is the same formula, with the interpretation that for a fixed
(x0 , y0 ) the value of z on the graph of the function is near that of z on the tangent plane.
Thus, for (x0 , y0 ) ≈ (2, 1) we have
Δz ≈ 4Δx + 2Δy.
MIT OpenCourseWare

Fall 2010
Critical Points
Critical points:
A standard question in calculus, with applications to many fields, is to find the points where
a function reaches its relative maxima and minima.
Just as in single variable calculus we will look for maxima and minima (collectively called
extrema) at points (x0 , y0 ) where the first derivatives are 0. Accordingly we define a critical
point as any point (x0 , y0 ) where
∂f ∂f
(x0 , y0 ) = 0 and (x0 , y0 ) = 0.
∂x ∂y
Often we will abbreviate this as fx = 0 and fy = 0.
Our first job is to verify that relative maxima and minima occur at critical points. The
figures below illustrates that they occur at places where the tangent plane is horizontal.
z z
y y
x x
Max. with horizontal tang. plane Min. with horizontal tang. plane
Since horizontal planes are of the form z = constant. and the equation of the tangent plane
at (x0 , y0 , z0 ) is
z = z0 + fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 )
we see it is horizontal when
fx (x0 , y0 ) = 0 and fy (x0 , y0 ) = 0.
Thus, extrema occur at critical points. But, just as in single variable calculus, not all critical
points are extrema.
Example: Find the critical points of z = x2 + y 2 + .5.
∂z ∂z
Answer: = 2x and = 2y. Clearly the only point where both derivatives are 0 is
∂x ∂y
(0, 0). Thus, there is a single critical point at (0, 0). The figure shows it is clearly the point
where z reaches a minimum value. (See the figure above on the right.)
Example: Find the critical points of z = 1 − x2 − y 2 .
∂z ∂z
Answer: = −2x and = −2y. Clearly the only point where both derivatives are
∂x ∂y
0 is (0, 0). Thus, there is a single critical point at (0, 0). The figure shows it is clearly the
point where z reaches a maximum value. (See the figure above on the left.)
1
Example: Find the critical points of z = −x2 + y 2 .
∂z ∂z
Answer: = −2x and = 2y. Clearly the only point where both derivatives are
∂x ∂y
0 is (0, 0). Thus, there is a single critical point at (0, 0). The figure shows it is neither a
minimum or a maximum.
z
x
Saddle with horizontal tang. plane
Example: Making a box with minimum material.

A box is made of cardboard with double thick sides, a triple thick bottom, single thick front
and back and no top. It’s volume = 3.
What dimensions use the least amount of cardboard?
Answer: The box shown has dimensions x, y, and z.
The area of one side = yz. There are two double thick sides ⇒ cardboard used = 4yz.
The area of the front (and back) = xz. It is single thick ⇒ cardboard used = 2xz.
The area of the bottom = xy. It is triple thick ⇒ cardboard used = 3xy.
Thus, the total cardboard used is
w = 4yz + 2xz + 3xy.
3
The volume = 3 = xyz ⇒ z = xy . Substituting this in the formula for w gives
12 6
w= + + 3xy.
x y
We find the critical points of w.
12 6
wx = − 2 + 3y = 0, wy = − 2 + 3x = 0.
x y
4 6
The first equation implies y = 2 . Substituting this in the second equation gives − x4 + 3x = 0.
x 16
Thus, x = 0 or 2. We reject 0 since then y is undefined. Using x = 2 we find y = 1. Thus,
there there is one critical point at (2,1). and at this point we have z = 3/2.
This point gives the box with minimum cardboard used because physically we know it must
have a minimum somewhere. Later we will learn to check this with the second derivative
test.
z
y
x
2
MIT OpenCourseWare

Fall 2010
Least Squares Interpolation
1. The least-squares line.

Suppose you have a large number n of experimentally determined points, through which
you want to pass a curve. There is a formula (the Lagrange interpolation formula) producing
a polynomial curve of degree n − 1 which goes through the points exactly. But normally one
wants to find a simple curve, like a line, parabola, or exponential, which goes approximately
through the points, rather than a high-degree polynomial which goes exactly through them.
The reason is that the location of the points is to some extent determined by experimental
error, so one wants a smooth-looking curve which averages out these errors, not a wiggly
polynomial which takes them seriously.
In this section, we consider the most common case — finding a line which
goes approximately through a set of data points.
Suppose the data points are
(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )
and we want to find the line
(1) y = ax + b
which “best” passes through them. Assuming our errors in measurement are distributed
randomly according to the usual bell-shaped curve (the so-called “Gaussian distribution”),
it can be shown that the right choice of a and b is the one for which the sum D of the
squares of the deviations
n
(xi ,yi )
� � �2
(2) D = yi − (axi + b)
i=1
is a minimum. In the formula (2), the quantities in parentheses (shown by (xi , ax i+b)
dotted lines in the picture) are the deviations between the observed values
yi and the ones axi + b that would be predicted using the line (1). i
The deviations are squared for theoretical reasons connected with the assumed Gaussian
error distribution; note however that the effect is to ensure that we sum only positive
quantities; this is important, since we do not want deviations of opposite sign to cancel each
other out. It also weights more heavily the larger deviations, keeping experimenters honest,
since they tend to ignore large deviations (“I had a headache that day”).
This prescription for finding the line (1) is called the method of least squares, and the
resulting line (1) is called the least-squares line or the regression line.
To calculate the values of a and b which make D a minimum, we see where the two partial
derivatives are zero:
n
∂D �
= 2(yi − axi − b)(−xi ) = 0
∂a i=1
(3) n
∂D �
= 2(yi − axi − b)(−1) = 0 .
∂b i=1
1
2 LEAST SQUARES INTERPOLATION
These give us a pair of linear equations for determining a and b, as we see by collecting
terms and cancelling the 2’s:
��
2
xi a + xi b = x i yi
(4) ��
xi a + n b = yi .
(Notice that it saves a lot of work to differentiate (2) using the chain rule, rather than first
expanding out the squares.)
The equations (4) are usually divided by n to make them more expressive:
1�
s̄ a + x̄ b = x i yi
(5) n
x̄ a + b = ȳ ,
� 2
where x̄ and ȳ are the average of the xi and yi , and s̄ = xi /n is the average of the squares.
From this point on use linear algebra to determine a and b. It is a good exercise to see
that the equations are always solvable unless all the xi are the same (in which case the best
line is vertical and can’t be written in the form (1)).
In practice, least-squares lines are found by pressing a calculator button, or giving a
MatLab command. Examples of calculating a least-squares line are in the exercises accom
panying the course. Do them from scratch, starting from (2), since the purpose here is to
get practice with max-min problems in several variables; don’t plug into the equations (5).
Remember to differentiate (2) using the chain rule; don’t expand out the squares, which
leads to messy algebra and highly probable error.
2. Fitting curves by least squares.

If the experimental points seem to follow a curve rather than a line, it might make more
sense to try to fit a second-degree polynomial
(6) y = a0 + a1 x + a2 x 2
to them. If there are only three points, we can do this exactly (by the Lagrange interpolation
formula). For more points, however, we once again seek the values of a0 , a1 , a2 for which
the sum of the squares of the deviations
n
� �2
yi − (a0 + a1 xi + a2 x2i )
�
(7) D =
1
is a minimum. Now there are three unknowns, a0 , a1 , a2 . Calculating (remember to use the
chain rule!) the three partial derivatives ∂D/∂ai , i = 0, 1, 2, and setting them equal to zero
leads to a square system of three linear equations; the ai are the three unknowns, and the
coefficients depend on the data points (xi , yi ). They can be solved by finding the inverse
matrix, elimination, or using a calculator or MatLab.
If the points seem to lie more and more along a line as x → ∞, but lie on one side of the
line for low values of x, it might be reasonable to try a function which has similar behavior,
like
1
(8) y = a0 + a1 x + a2
x
LEAST SQUARES INTERPOLATION 3
and again minimize the sum of the squares of the deviations, as in (7). In general, this
method of least squares applies to a trial expression of the form
(9) y = a0 f0 (x) + a1 f1 (x) + . . . + ar fr (x),
where the fi (x) are given functions (usually simple ones like 1, x, x2 , 1/x, ekx , etc. Such an
expression (9) is called a linear combination of the functions fi (x). The method produces
a square inhomogeneous system of linear equations in the unknowns a0 , . . . , ar which can
be solved by finding the inverse matrix to the system, or by elimination.
The method also applies to finding a linear function
(10) z = a1 + a2 x + a3 y
to fit a set of data points
(11) (x1 , y1 , z1 ), . . . , (xn , yn , zn ) .
where there are two independent variables x and y and a dependent variable z (this is
the quantity being experimentally measured, for different values of (x, y)). This time after
differentiation we get a 3 × 3 system of linear equations for determining a1 , a2 , a3 .
The essential point in all this is that the unknown coefficients ai should occur linearly
in the trial function. Try fitting a function like cekx to data points by using least squares,
and you’ll see the difficulty right away. (Since this is an important problem — fitting an
exponential to data points — one of the Exercises explains how to adapt the method to this
type of problem.)
MIT OpenCourseWare

Fall 2010
Least squares interpolation
1. Use the method of least squares to fit a line to the three data points
(0, 0), (1, 2), (2, 1).
Answer: We are looking for the line y = ax + b that best models the data. The deviation
of a data point (xi , yi ) from the model is
yi − (axi + b).
By best we mean the line that minimizes the sum of the squares of the deviation. That is
we want to minimize
D = (0 − (a · 0 + b))2 + (2 − (a · 1 + b))2 + (1 − (a · 2 + b))2

= b2 + (2 − a − b)2 + (1 − 2a − b)2 .
(Remember, the variables whose values are to be found are a and b.) We do not expand
out the squares, rather we take the derivatives first. Setting the derivatives equal to 0 gives
∂D
∂a = −2(2 − a − b) − 4(1 − 2a − b) = 0 ⇒ 10a + 6b = 8 ⇒ 5a + 3b = 4
∂D
∂b = 2b − 2(2 − a − b) − 2(1 − 2a − b) = 0 ⇒ 6a + 6b = 6 ⇒ 3a + 3b = 3.
This linear system of two equations in two unknowns is easy to solve. We get
1 1
a= , b= .
2 2
Here is a plot of the problem.
y
3
y = 12 x + 1
2
2
x
1 2 3
MIT OpenCourseWare

Fall 2010
Second Derivative Test
1. The Second Derivative Test

We begin by recalling the situation for twice differentiable functions f (x) of one variable.
To find their local (or “relative”) maxima and minima, we
1. find the critical points, i.e., the solutions of f ′ (x) = 0;
2. apply the second derivative test to each critical point x0 :
f ′′ (x0 ) > 0 ⇒ x0 is a local minimum point;
f ′′ (x0 ) < 0 ⇒ x0 is a local maximum point.
The idea behind it is: at x0 the slope f ′ (x0 ) = 0; if f ′′ (x0 ) > 0, then f ′ (x) is strictly
increasing for x near x0 , so that the slope is negative to the left of x0 and positive to the
right, which shows that x0 is a minimum point. The reasoning for the maximum point is
similar.
If f ′′ (x0 ) = 0, the test fails and one has to investigate further, by taking more derivatives,
or getting more information about the graph. Besides being a maximum or minimum, such
a point could also be a horizontal point of inflection.
The analogous test for maxima and minima of functions of two variables f (x, y) is a
little more complicated, since there are several equations to satisfy, several derivatives to be
taken into account, and another important geometric possibility for a critical point, namely
a saddle point. This is a local minimax point; around such a point the graph of f (x, y)
looks like the central part of a saddle, or the region around the highest point of a mountain
pass. In the neighborhood of a saddle point, the graph of the function lies both above and
below its horizontal tangent plane at the point.
The second-derivative test for maxima, minima, and saddle points has two steps.
fx (x, y) = 0,
�
1. Find the critical points by solving the simultaneous equations
fy (x, y) = 0.
Since a critical point (x0 , y0 ) is a solution to both equations, both partial derivatives are
zero there, so that the tangent plane to the graph of f (x, y) is horizontal.
2. To test such a point to see if it is a local maximum or minimum point, we calculate
the three second derivatives at the point (we use subscript 0 to denote evaluation at (x0 , y0 ),
so for example (f )0 = f (x0 , y0 )), and denote the values by A, B, and C:
(1) A = (fxx )0 , B = (fxy )0 = (fyx )0 , C = (fyy )0 ,
(we are assuming the derivatives exist and are continuous).
Second-derivative test. Let (x0 , y0 ) be a critical point of f (x, y), and A, B, and C
be as in (1). Then
AC − B 2 > 0, A > 0 or C > 0 ⇒ (x0 , y0 ) is a minimum point;

2
AC − B > 0, A < 0 or C < 0 ⇒ (x0 , y0 ) is a maximum point;
AC − B 2 < 0 ⇒ (x0 , y0 ) is a saddle point.
If AC − B 2 = 0, the test fails and more investigation is needed.

Note that if AC − B 2 > 0, then AC > 0, so that A and C must have the same sign.
1
2 SECOND DERIVATIVE TEST
Example 1. Find the critical points of w = 12x2 + y 3 − 12xy and determine their type.
Solution. We calculate the partial derivatives easily:
A = wxx = 24
wx = 24x − 12y
(2) B = wxy = −12
wy = 3y 2 − 12x
C = wyy = 6y
To find the critical points we solve simultaneously the equations wx = 0 and wy = 0; we get
wx = 0 y = 2x (x, y) = (0, 0)
⇒ 2 ⇒ 4x2 = 4x ⇒ x = 0, 1 ⇒ .
wy = 0 y = 4x (x, y) = (1, 2)
Thus there are two critical points: (0, 0) and (1, 2). To determine their type, we use the
second derivative test: we have AC − B 2 = 144y − 144, so that
at (0, 0), we have AC − B 2 = −144, so it is a saddle point; 3
at (1, 2), we have AC − B 2 = 144 and A > 0, so it is a a

minimum point. 2
A plot of the level curves is given at the right, which con

firms the above. Note that the behavior of the level curves 1
near the origin can be determined by using the approximation

w ≈ 12x2 − 12xy; this shows the level curves near (0, 0) look 0
like those of the function x(x − y): the family of hyperbolas

−1
x(x − y) = c, with asymptotes given by the degenerate hyper-
bola x(x − y) = 0, i.e., the pair of lines x = 0 (the y-axis) and
−2
x − y = 0 (the diagonal line y = x).
−3
2. Justification for the Second-derivative Test. −3 −2 −1 0 1 2 3
The test involves the quantity AC − B 2 . In general, whenever we see the expressions
B − 4AC or B 2 − AC or their negatives, it means the quadratic formula is involved, in one
2
of its two forms (the second is often used to get rid of the excess two’s):
√
−B ± B 2 − 4AC
(3) Ax2 + Bx + C = 0 ⇒ x =
√2A
−B ± B 2 − AC
(4) Ax2 + 2Bx + C = 0 ⇒ x =
A
This is what is happening here. We want to know whether, near a critical point P0 , the
graph of our function w = f (x, y) always stays on one side of its horizontal tangent plane
(P0 is then a maximum or minimum point), or whether it lies partly above and partly below
the tangent plane (P0 is then a saddle point). As we will see, this is determined by how the
graph of a quadratic function f (x) lies with respect to the x-axis. Here is the basic lemma.
Lemma. For the quadratic function Ax2 + 2Bx + C,
(5) AC − B 2 > 0, A > 0 or C > 0 ⇒ Ax2 + 2Bx + C > 0 for all x;

2 2
(6) AC − B > 0, A < 0 or C < 0 ⇒ Ax + 2Bx + C < 0 for all x;
Ax2 + 2Bx + C > 0, for some x;
�
(7) AC − B 2 < 0 ⇒
Ax2 + 2Bx + C < 0, for some x.
SECOND DERIVATIVE TEST 3
Proof of the Lemma. To prove (5), we note that the quadratic formula in the form (4)
shows that the zeros of Ax2 + 2Bx + C are imaginary, i.e., it has no real zeros. Therefore
its graph must lie entirely on one side of the x-axis; which side can be determined from
either A or C, since
A > 0 ⇒ lim Ax2 + 2Bx + C = ∞; C > 0 ⇒ Ax2 + 2Bx + C > 0 when x = 0.
x→∞
If A < 0 or C < 0, the reasoning is analogous and proves (6).

If on the other hand AC − B 2 < 0, formula (4) shows the quadratic function has two real
roots, so that its parabolic graph crosses the x-axis twice, and hence lies partly above and
partly below it. This proves (7). �
Proof of the Second-derivative Test in a special case.

The simplest function is a linear function, w = w0 + ax + by, but it does not in general
have maximum or minimum points and its second derivatives are all zero. The simplest
functions to have interesting critical points are the quadratic functions, which we write in
the form (the 2’s will be explained momentarily):
1
(8) w = w0 + ax + by + (Ax2 + 2Bxy + Cy 2 ).
2
Such a function has in general a unique critical point, which we will assume is (0, 0); this
gives the function a special form, which we can determine by evaluating its partial derivatives
at (0, 0):
wxx = A
(wx )0 = a
(9) wxy = B
(wy )0 = b
wyy = C
(The neat look of the above explains the 12 and 2B in (8).) Since (0, 0) is a critical point,
(9) shows that a = 0 and b = 0, so our quadratic function has the form
1
(10) w − w0 = (Ax2 + 2Bxy + Cy 2 ).
2
We moved w0 to the left side since the tangent plane at (0, 0) is the horizontal plane w = w0 ,
and we are interested in whether the graph of the quadratic function lies above or below
this tangent plane, i.e., whether w − w0 > 0 or w − w0 < 0 at points other than the origin.
If (x, y) =
� (0, 0), then either x =
� 0 or y =� 0; say y =
� 0. Then we write (10) as
2
� � �2 � � �
y x x
(11) w − w0 = A + 2B +C
2 y y
We know that y 2 > 0 if y = � 0; applying our previous lemma to the factor on the right of
(11), (or if y = 0, switching the roles of x and y in (11) and applying the lemma), we get
AC − B 2 > 0, A > 0 or C > 0 ⇒ w − w0 > 0 for all (x, y) =
� (0, 0);
⇒ (0, 0) is a minimum point;
AC − B 2 > 0, A < 0 or C < 0 ⇒ w − w0 < 0 for all (x, y) =
� (0, 0);
⇒ (0, 0) is a maximum point;
w − w0 > 0, for some (x, y);
�
AC − B 2 < 0 ⇒
w − w0 < 0, for some (x, y);
⇒ (0, 0) is a saddle point.
4 SECOND DERIVATIVE TEST
Argument for the Second-derivative Test for a general function.

This part won’t be rigorous, only suggestive, but it will give the right idea.
We consider a general function w = f (x, y), and assume it has a critical point at (x0 , y0 ),
and continuous second derivatives in the neighborhood of the critical point. Then by a
generalization of Taylor’s formula to functions of several variables, the function has a best
quadratic approximation at the critical point. To simplify the notation, we will move the
critical point to the origin by making the change of variables
u = x − x0 , v = y − y0 .
Then the best quadratic approximation is (if the x, y on the left and u, v on the right is
upsetting, just imagine u and v replaced everywhere by x − x0 and y − y0 ):
1� 2
Au + 2Buv + Cv 2 ;
�
(13) w = f (x, y) ≈ w0 +
2
here the coefficients A, B, C are given as in (1) by the second partial derivatives with respect
to u and v at (0, 0), or what is the same (according to the chain rule—see the footnote below),
by the second partial derivatives with respect to x and y at (x0 , y0 ).
(Intuitively, one can see the coefficients have these values by differentiating
both sides of (13) and pretending the approximation is an equality. There are no
linear terms in u and v on the right since (0, 0) is a critical point.)
Since the quadratic function on the right of (13) is the best approximation to w = f (x, y)
for (x, y) close to (x0 , y0 ), it is reasonable to suppose that their graphs are essentially the
same near (x0 , y0 ), so that if the quadratic function has a maximum, minimum or saddle
point there, so will f (x, y). Thus our results for the special case of a quadratic function
having the origin as critical point carry over to the general function f (x, y) at a critical
point (x0 , y0 ), if we interpret A, B, C as the second partial derivatives at (x0 , y0 ).
This is what the second derivative test says. �
Footnote: Using u = x − x0 and v = y − y0 , we can apply the chain rule for partial
derivatives, which tells us that for all x, y and the corresponding u, v, we have
∂u ∂v
wx = wu + wv = wu , since ux = 1 and vx = 0,
∂x ∂x
and similarly, wy = wv . Therefore at the corresponding points,
(wx )(x0 ,y0 ) = (wu )(0,0) , (wy )(x0 ,y0 ) = (wv )(0,0) ,
and differentiating once more and using the same reasoning,
(wxx )(x0 ,y0 ) = (wuu )(0,0) , (wxy )(x0 ,y0 ) = (wuv )(0,0) , (wyy )(x0 ,y0 ) = (wvv )(0,0) .
MIT OpenCourseWare

Fall 2010
Second derivative test
1. Find and classify all the critical points of
f (x, y) = x6 + y 3 + 6x − 12y + 7.
Answer: Taking the first partials and setting them to 0:

∂z ∂z
= 6x5 + 6 = 0 and = 3y 2 − 12 = 0.
∂x ∂y
The first equation implies x = −1 and the second implies y = ±2. Thus, the critical points
are (−1, 2) and (−1, −2).
Taking second partials:
∂2z ∂2z ∂2z

= 30x4 , = 0, = 6y.
∂x2 ∂xy ∂y 2
We analyze each critical point in turn.
At (-1,-2): A = zxx (−1, −2) = 30, B = zxy (−1, −2) = 0, C = zyy (−1, −2) = −12.
Therefore AC − B 2 = −360 < 0, which implies the critical point is a saddle.
At (-1,2): A = zxx (−1, 2) = 30, B = zxy (−1, 2) = 0, C = zyy (−1, 2) = 12.
Therefore AC − B 2 = 360 > 0 and A > 0, which implies the critical point is a minimum.
MIT OpenCourseWare

Fall 2010
Chain Rule and Total Differentials
1. Find the total differential of w = x3 yz + xy + z + 3 at (1, 2, 3).
Answer: The total differential at the point (x0 , y0 , z0 ) is
dw = wx (x0 , y0 , z0 ) dx + wy (x0 , y0 , z0 ) dy + wz (x0 , y0 , z0 ) dz.
In our case,
wx = 3x2 yz + y, wy = x3 z + x, wz = x3 y + 1.
Substituting in the point (1, 2, 3) we get: wx (1, 2, 3) = 20, wy (1, 2, 3) = 4, wz (1, 2, 3) = 3.
Thus,
dw = 20 dx + 4 dy + 3 dz.
2. Suppose w = x3 yz + xy + z + 3 and
x = 3 cos t, y = 3 sin t, z = 2t.
dw
Compute and evaluate it at t = π/2.
dt
Answer: We do not substitute for x, y, z before differentiating, so we can practice the chain
rule.
dw ∂w dx ∂w dy ∂w dz
= + +
dt ∂x dt ∂y dt ∂z dt
= (3x2 yz + y)(−3 sin t) + (x3 z + x)(3 cos t) + (x3 y + 1)(2).
At t = π/2 we have x = 0, y = 3, z = π, sin π/2 = 1, cos π/2 = 0.
Thus,
dw
= 3(−3) + 3(0) + (1)2 = −7.
dt π/2
3. Show how the tangent approximation formula leads to the chain rule that was used in
the previous problem.
Answer: The approximation formula is
∂f ∂f ∂f
Δw ≈ Δx + Δy + Δz.
∂x o ∂y o ∂z o
If x, y, z are functions of time then dividing the approximation formula by Δt gives
Δw ∂f Δx ∂f Δy ∂f Δz
≈ + + .
Δt ∂x o Δt ∂y o Δt ∂z o Δt
In the limit as Δt → 0 we get the chain rule.
dw
Note: we use the regular ’d’ for the derivative dt because in the chain of computations
t → x, y, z → w
the dependent variable w is ultimately a function of exactly one independent variable t.
Thus, the derivative with respect to t is not a partial derivative.
MIT OpenCourseWare

Fall 2010
Chain rule
Now we will formulate the chain rule when there is more than one independent variable.
We suppose w is a function of x, y and that x, y are functions of u, v. That is,
w = f (x, y) and x = x(u, v), y = y(u, v).
The use of the term chain comes because to compute w we need to do a chain of computa
tions
(u, v) → (x, y) → w.
We will say w is a dependent variable, u and v are independent variables and x and y
are intermediate variables.
∂w ∂w
Since w is a function of x and y it has partial derivatives and .
∂x ∂y
Since, ultimately, w is a function of u and v we can also compute the partial derivatives
∂w ∂w
and . The chain rule relates these derivatives by the following formulas.
∂u ∂v
∂w ∂w ∂x ∂w ∂y
= +
∂u ∂x ∂u ∂y ∂u
∂w ∂w ∂x ∂w ∂y
= + .
∂v ∂x ∂v ∂y ∂v
∂w
Example: Given w = x2 y + y 2 + x, x = u2 v, y = uv 2 find .
∂u
Answer: First we compute
∂w ∂w ∂x ∂y ∂x ∂y
= 2xy + 1, = x2 + 2y, = 2uv, = v2 , = u2 , = 2uv.
∂x ∂y ∂u ∂u ∂v ∂v
The chain rule then implies
∂w ∂w ∂x ∂w ∂y
= +
∂u ∂x ∂u ∂y ∂u
= (2xy + 1)2uv + (x2 + 2y)v 2
∂w ∂w ∂x ∂w ∂y
= +
∂v ∂x ∂v ∂y ∂v
= (2xy + 1)u2 + (x2 + 2y)2uv.
Often, it is okay to leave the variables mixed together. If, for example, you wanted to
compute ∂w ∂u when (u, v) = (1, 2) all you have to do is compute x and y and use these
∂w
values, along with u, v, in the formula for .
∂u
∂w
x = 2, y = 4 ⇒ = (5)(4) + (12)(4) = 68.
∂u
If you actually need the derivatives expressed in just the variables u and v then you would
have to substitute for x, y and z.
Proof of the chain rule:
Just as before our argument starts with the tangent approximation at the point (x0 , y0 ).
� �
∂w �� ∂w ��
Δw ≈ Δx + Δy.
∂x �o ∂y �o
Now hold v constant and divide by Δu to get

� �
Δw ∂w �� Δx ∂w �� Δy
≈ + .
Δu ∂x �o Δu ∂y �o Δu
∂w
Finally, letting Δu → 0 gives the chain rule for .
∂u
Ambiguous notation
Often you have to figure out the dependent and independent variables from context.
Thermodynamics is a big player here. It has, for example, the variables P , T , V , U , S.
and any two can be taken to be independent and the others are functions of those two.
We will do more with this topic in the future.
MIT OpenCourseWare

Fall 2010 ��
Chain rule with more variables
1. Let w = xyz, x = u2 v, y = uv 2 , z = u2 + v 2 .
∂w
a) Use the chain rule to find .
∂u
b) Find the total differential dw in terms of du and dv.
∂w
c) Find at the point (u, v) = (1, 2).
∂u
Answer: a) The chain rule says
∂w ∂w ∂x ∂w ∂y ∂w ∂z
= + +
∂u ∂x ∂u ∂y ∂u ∂z ∂u
= (yz)(2uv) + (xz)(v 2 ) + (xy)(2u).
b) Using the formulas given we get
dw = yz dx + xz dy + xy dz
and
dx = 2uv du + u2 dv, dy = v 2 du + 2uv dv, dz = 2u du + 2v dv.
Substituting for dx, dy, dz in the equation for dw gives
dw = (yz)(2uv du + u2 dv) + (xz)(v 2 du + 2uv dv) + (xy)(2u du + 2v dv).

= (2yzuv + xzv 2 + 2xyu) du + (yzu2 + 2xzuv + 2xyv) dv.
Therefore
∂w ∂w
= 2yzuv + xzv 2 + 2xyu and = yzu2 + 2xzuv + 2xyv.
∂u ∂v
c) We do the chain of computations to compute the partial.
∂w
(u, v) = (1, 2) ⇒ (x, y, z) = (2, 4, 5) ⇒ = (20)(4) + (10)(4) + (8)(2) = 136.
∂u
MIT OpenCourseWare

Fall 2010
Gradient: definition and properties
Definition of the gradient

∂w ∂w
If w = f (x, y), then and are the rates of change of w in the i and j directions.
∂x ∂y
It will be quite useful to put these two derivatives together in a vector called the gradient
of w. � �
∂w ∂w
grad w = , .
∂x ∂y
We will also use the symbol �w to denote the gradient. (You read this as ’gradient of w’
or ’grad w’.)
Of course, if we specify a point P0 = (x0 , y0 ), we can evaluate the gradient at that point.
We will use several notations for this
� � � �
∂w �� ∂w ��
grad w(x0 , y0 ) = � w|P0 = � w|o = , .
∂x �o ∂y �o
Note well the following: (as we look more deeply into properties of the gradient these can
be points of confusion).
1. The gradient takes a scalar function f (x, y) and produces a vector �f .
2. The vector �f (x, y) lies in the plane.
For functions w = f (x, y, z) we have the gradient

� �
∂w ∂w ∂w
grad w = �w = , , .
∂x ∂y ∂z
That is, the gradient takes a scalar function of three variables and produces a three dimen
sional vector.
The gradient has many geometric properties. In the next session we will prove that for
w = f (x, y) the gradient is perpendicular to the level curves f (x, y) = c. We can show this
by direct computation in the following example.
Example 1: Compute the gradient of w = (x2 + y 2 )/3 and show that the gradient at
(x0 , y0 ) = (1, 2) is perpendicular to the level curve through that point.
Answer: The gradient is easily computed y�
� � �
2 � �
�w = �2x/3, 2y/3� = �x, y�. �
3 � �
� � � � � x�
At (1, 2) we get �w(1, 2) = 23 �1, 2�. The level curve through (1, 2) is � �
�
2 2 � �
(x + y )/3 = 5/3,
� � �
√ �
which is identical tox2 + y 2
= 5. That is, it is a circle of radius 5 centered
at the origin. Since the gradient at (1,2) is a multiple of �1, 2�, it points z = (x2 + y 2 )/3
radially outward and hence is perpendicular to the circle. Below is a figure
showing the gradient field and the level curves.
Example 2: Consider the graph of y = ex . Find a vector perpendicular to the tangent
to y = ex at the point (1, e).
Old method: Find the slope take the negative reciprocal and make the vector.
New method: This graph is the level curve of w = y − ex = 0.
�w = �−ex , 1� ⇒ (at x = 1) �w(1, e) = �−e, 1� is perpendicular to the tangent vector to
the graph, v = �1, e�.
Higher dimensions
Similarly, for w = f (x, y, z) we get level surfaces f (x, y, z) = c. The gradient is perpendic
ular to the level surfaces.
Example 3: Find the tangent plane to the surface x2 + 2y 2 + 3z 2 = 6 at the point

P = (1, 1, 1).
Answer: Introduce a new variable
w = x2 + 2y 2 + 3z 2 .
Our surface is the level surface w = 6. Saying the gradient is perpendicular to the surface
means exactly the same thing as saying it is normal to the tangent plane. Computing
�w = �2x, 4y, 6z� ⇒ �w|P = �2, 4, 6�.
Using point normal form we get the equation of the tangent plane is
2(x − 1) + 4(y − 1) + 6(z − 1) = 0, or 2x + 4y + 6z = 12.

MIT OpenCourseWare

Fall 2010 ��
Gradient: proof that it is perpendicular to level curves and
surfaces
Let w = f (x, y, z) be a function of 3 variables. We will show that at any point

P = (x0 , y0 , z0 ) on the level surface f (x, y, z) = c (so f (x0 , y0 , z0 ) = c) the gradient � f |P
is perpendicular to the surface.
By this we mean it is perpendicular to the tangent to any curve that lies on the surface and
goes through P . (See figure.)
This follows easily from the chain rule: Let
r(t) = �x(t), y(t), z(t)�
be a curve on the level surface with r(t0 ) = �x0 , y0 , z0 �. We let g(t) = f (x(t), y(t), z(t)).
Since the curve is on the level surface we have g(t) = f (x(t), y(t), z(t)) = c. Differentiating
this equation with respect to t gives
� � � � � �
dg ∂f �� dx �� ∂f �� dy �� ∂f �� dz ��
= + + = 0.
dt ∂x �P dt �t0 ∂y �P dt �t0 ∂z �P dt �t0
In vector form this is

� � � � � � � � � �
∂f �� ∂f �� ∂f �� dx �� dy �� dz ��
, , · , , =0
∂x �P ∂y �P ∂z �P dt �t0 dt �t0 dt �t0
⇔ � f |P · r� (t0 ) = 0.
Since the dot product is 0, we have shown that the gradient is perpendicular to the tangent
to any curve that lies on the level surface, which is exactly what we needed to show.
z
�f
x
MIT OpenCourseWare

Fall 2010 ��
Gradient: proof that it is perpendicular to level curves and
surfaces
Let w = f (x, y, z) be a function of 3 variables. We will show that at any point

P = (x0 , y0 , z0 ) on the level surface f (x, y, z) = c (so f (x0 , y0 , z0 ) = c) the gradient � f |P
is perpendicular to the surface.
By this we mean it is perpendicular to the tangent to any curve that lies on the surface and
goes through P . (See figure.)
This follows easily from the chain rule: Let
r(t) = �x(t), y(t), z(t)�
be a curve on the level surface with r(t0 ) = �x0 , y0 , z0 �. We let g(t) = f (x(t), y(t), z(t)).
Since the curve is on the level surface we have g(t) = f (x(t), y(t), z(t)) = c. Differentiating
this equation with respect to t gives
� � � � � �
dg ∂f �� dx �� ∂f �� dy �� ∂f �� dz ��
= + + = 0.
dt ∂x �P dt �t0 ∂y �P dt �t0 ∂z �P dt �t0
In vector form this is

� � � � � � � � � �
∂f �� ∂f �� ∂f �� dx �� dy �� dz ��
, , · , , =0
∂x �P ∂y �P ∂z �P dt �t0 dt �t0 dt �t0
⇔ � f |P · r� (t0 ) = 0.
Since the dot product is 0, we have shown that the gradient is perpendicular to the tangent
to any curve that lies on the level surface, which is exactly what we needed to show.
z
�f
x
MIT OpenCourseWare

Fall 2010 ��
Tangent Plane to a Level Surface
1. Find the tangent plane to the surface x2 + 2y 2 + 3z 2 = 36 at the point P = (1, 2, 3).
Answer: In order to use gradients we introduce a new variable
w = x2 + 2y 2 + 3z 2 .
Our surface is then the the level surface w = 36. Therefore the normal to surface is
Vw = U2x, 4y, 6z).
At the point P we have Vw|P = U2, 8, 18). Using point normal form, the equation of the
tangent plane is
2(x − 1) + 8(y − 2) + 18(z − 3) = 0, or equivalently 2x + 8y + 18z = 72.
2. Use gradients and level surfaces to find the normal to the tangent plane of the graph of
z = f (x, y) at P = (x0 , y0 , z0 ).
Answer: Introduce the new variable
w = f (x, y) − z.
The graph of z = f (x, y) is just the level surface w = 0. We compute the normal to the
surface to be
Vw = Ufx , fy , −1).
At the the point P the normal is Ufx (x0 , y0 ), fy (x0 , y0 ), −1), so the equation of the tangent
plane is
fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 ) − (z − z0 ) = 0.
We can write this in a more compact form as
∂f ∂f
(z − z0 ) = (x − x0 ) + (y − y0 ),
∂x 0 ∂y 0
which is exactly the formula we saw earlier for the tangent plane to a graph.
MIT OpenCourseWare

Fall 2010
Directional Derivatives
Directional derivative
Like all derivatives the directional derivative can be thought of as a ratio. Fix a unit vector
u and a point P0 in the plane. The directional derivative of w at P0 in the direction u
is defined as
dw Δw
= lim .
ds P0 ,u Δs→0 Δs
Here Δw is the change in w caused by a step of length Δs in the direction of u (all in the
xy-plane).
Below we will show that
dw
= Vw(P0 ) · u. (1)
ds P0 ,u
We illustrate this with a figure showing the graph of w = f (x, y). Notice that Δs is
measured in the plane and Δw is the change of w on the graph.
ww
•.� . .
� ..
Δw ..
� ..
Δs ..
..
..
..
..
..
• .. -y
P0
Δs
u
JJ
JJ
x
Proof of equation 1
The figure below represents the change in position from P0 resulting from taking a step of
size Δs in the u direction.
J u
yw JJ
JJ
JJJ
J
ΔsJJJ Δy
JJJ
P0 •JJ
Δx -x
Δx Δy
Since (Δs)2 = (Δx)2 + (Δy)2 we have that , . is a unit vector, so
Δs Δs
Δx Δy
u= , .
Δs Δs
The tangent plane approximation at P0 is

∂w ∂w
Δw ≈ Δx + Δy
∂x P0 ∂y P0
1
Dividing this approximation by Δs gives
Δw ∂w Δx ∂w Δy
≈ + .
Δs ∂x P0 Δs ∂y P0 Δs
We can rewrite this as a dot product
Δw ∂w ∂w Δx Δy
≈ , · , .
Δs ∂x P0 ∂y P0 Δs Δs
In the dot product the first term is Vw|P0 and the second is just u, so,
Δw
≈ Vw|P0 · u.
Δs
Now taking the limit we get equation (1).
Example: (Algebraic example) Let w = x3 + 3y 2 .

dw
Compute at P0 = (1, 2) in the direction of v = 3i + 4j.
ds
Answer: We compute all the necessary pieces:
i) Vw = (3x2 , 6y) ⇒ Vw|(1,2) = (3, 12).
v 3 4
ii) u must be a unit vector, so u = = ( , ).
|v| 5 5
dw 3 4 57
iii) = Vw|(1,2) · u = (3, 12) · ( , ) = .
ds P0 ,u 5 5 5
Example: (Geometric example) Let u be the direction of (1, −1).

∂w ∂w dw
Using the picture at right estimate , , and .
∂x P ∂y p ds P,u yy
By measuring from P to the next in level curve in the w = 25
w = 15
x direction we see that Δx ≈ −.5. w=5
∂w Δw 10
⇒ ≈ ≈ = −20.
∂x P Δx −.5 4 Δy
1 y •
∂w P
Similarly, we get ≈ 20.
∂y P
Measuring in the u direction we get Δs ≈ −.3 Δx �x u
dw Δw 10 1
⇒ ≈ ≈ = −33.3.
ds P,u Δs .3
Direction of maximum change:

The direction that gives the maximum rate of change is in the same direction as Vw. The
proof of this uses equation (1). Let θ be the angle between Vw and u. Then the geometric
form of the dot product says
dw
= Vw · u = |Vw||u| cos θ = |Vw| cos θ.
ds u
2
(In the last equation we dropped the |u| because it equals 1.) Now it is obvious that this is
greatest when θ = 0. That is, when Vw and u are in the same direction.
3
MIT OpenCourseWare

Fall 2010
Lagrange Multipliers
We will give the argument for why Lagrange multipliers work later. Here, we’ll look at
where and how to use them. Lagrange multipliers are used to solve constrained optimization
problems. That is, suppose you have a function, say f (x, y), for which you want to find
the maximum or minimum value. But, you are not allowed to consider all (x, y) while you
look for this value. Instead, the (x, y) you can consider are constrained to lie on some curve
or surface. There are lots of examples of this in science, engineering and economics, for
example, optimizing some utility function under budget constraints.
Lagrange multipliers problem:

Minimize (or maximize) w = f (x, y, z) constrained by g(x, y, z) = c.
Lagrange multipliers solution:
Local minima (or maxima) must occur at a critical point. This is a point where Vf = λVg,
and g(x, y, z) = c.
Example: Making a box using a minimum amount of material.

A box is made of cardboard with double thick sides, a triple thick bottom, single thick front
and back and no top. It’s volume is fixed at 3.
What dimensions use the least amount of cardboard?
Answer: We did this problem once before by solving for z in terms of x and y and substi
tuting for it. That led to an unconstrained optimization problem in x and y. Here we will
do it as a constrained problem. It is important to be able to do this because eliminating
one variable is not always easy.
The box shown has dimensions x, y, and z.
z
y
x
The area of one side = yz. There are two double thick sides ⇒ cardboard used = 4yz.
The area of the front (and back) = xz. It is single thick ⇒ cardboard used = 2xz.
The area of the bottom = xy. It is triple thick ⇒ cardboard used = 3xy.
Thus, the total cardboard used is
w = f (x, y, z) = 4yz + 2xz + 3xy.
The fixed volume acts as the constraint. It forces a relation between x, y and z so they
can’t all be varied independently. The constraint is
V = xyz = 3.
Our first job is to set up the equations to look for critical points. Vf = (2z + 3y, 4z + 3x, 4y + 2x)
and VV = (yz, xz, xy).
The Lagrange multiplier equations are then
Vf = λVV, and V = 3
⇔ (2z + 3y, 4z + 3x, 4y + 2x) = λ(yz, xz, xy), xyz = 3
1
Next we solve these equations for critical points. We do this by solving for λ in each equation
(we call this solving symmetrically).
2z+3y
yz = λ 4z+3xxz = λ, 4y+2x
xy = λ, xyz = 3 ⇒ y2 + z3 = x4 + z3 = x4 + y2
2 4 3 2
⇒ y = x ⇒ x = 2y and z = y ⇒ z = 23 y
Now, xyz = 3 ⇒ 3y 3 = 3 ⇒ y = 1
Answer: x = 2, y = 1, z = 32 , w = 18.
Sphere example:
Minimize w = y constrained to x2 + y 2 + z 2 = 1.
Answer: Vf = (0, 1, 0), Vg = (2x, 2y, 2z)
Vf = λVg ⇒ (0, 1, 0) = λ(2x, 2y, 2z) ⇒ x = z = 0.
Constraint ⇒ y = ±1. (Gives the minimum and maximum respectively).
Example: (checking the boundary)

A rectangle in the plane is placed in the first quadrant so that one corner O is at the origin
and the two sides adjacent to O are on the axes. The corner P opposite O is on the curve
x + 2y = 1. Using Lagrange multipliers find for which point P the rectangle has maximum
area. Say how you know this point gives the maximum.
Answer: We need some names
g(x, y) = x + 2y = 1 = the constraint and f (x, y) = xy = the area.
yy
The gradients are: Vg = ji + 2j,
j Vf = y ji + x jj.
Lagrange multipliers: ⇒ y = λ, x = 2λ, x + 2y = 1. •
The first two equations ⇒ x = 2y; P
•
Combine this with the third equation ⇒ 4y = 1.
⇒ y = 1/4, x = 1/2 ⇒ P = (1/2, 1/4). • -x
O
We know this is a maximum because the maximum occurs either at a critical point or on
the boundary. In this case, the boundary points are on the axes at (1,0) and (0,1/2), which
gives a rectangle with area = 0.
Example: (boundary at ∞)
A rectangle in the plane is placed in the first quadrant so that one corner O is at the origin
and the two sides adjacent to O are on the axes. The corner P opposite O is on the curve
xy = 1. Using Lagrange multipliers find for which point P the rectangle has minimum
perimeter. Say how you know this point gives the minimum.
Answer: Let g(x, y) = xy = 1 = the constraint and f (x, y) = 2x + 2y = the perimeter.
Gradients: Vg = y ji + x jj, Vf = 2ji + 2jj. yy
Lagrange multipliers: ⇒ 2 = λy
2 = λx
xy = 1
The first two equations ⇒ x = y; P•
Combine this with the third equation ⇒ x2 = 1.
-x
⇒ x = 1, x = 1 ⇒ P = (1, 1).
We know this is a minimum because the minimum occurs either at a critical point or on the
boundary. In this case the boundary points are infinitely far out on the axes which gives a
rectangle with perimeter = ∞.
2
MIT OpenCourseWare

Fall 2010
Proof of Lagrange Multipliers
Here we will give two arguments, one geometric and one analytic for why Lagrange multi
pliers work.
Critical points
For the function w = f (x, y, z) constrained by g(x, y, z) = c (c a constant) the critical points
are defined as those points, which satisfy the constraint and where Vf is parallel to Vg.
In equations:
Vf (x, y, z) = λVg(x, y, z) and g(x, y, z) = c.
Statement of Lagrange multipliers

For the constrained system local maxima and minima (collectively extrema) occur at the
critical points.
Geometric proof for Lagrange
(We only consider the two dimensional case, w = f (x, y) with constraint g(x, y) = c.)
For concreteness, we’ve drawn the constraint curve, g(x, y) = c, as a circle and some level
curves for w = f (x, y) = c with explicit (made up) values. Geometrically, we are looking
for the point on the circle where w takes its maximum or minimum values.
Now, start at the level curve with w = 17, which has no points on the circle. So, clearly, the
maximum value of w on the constraint circle is less than 17. Move down the level curves
until they first touch the circle when w = 14. Call the point where the first touch P . It is
clear that P gives a local maximum for w on g = c, because if you move away from P in
either direction on the circle you’ll be on a level curve with a smaller value.
Since the circle is a level curve for g, we know Vg is perpendicular to it. We also know Vf
is perpendicular to the level curve w = 14, since the curves themselves are tangent, these
two gradients must be parallel.
Likewise, if you keep moving down the level curves, the last one to touch the circle will give
a local minimum and the same argument will apply.
yy�
Vf = λVg
•
P w = 17
w = 16
w = 15
w = 14
w = 12
w = 10
w=8 �x
w=6
1
Analytic proof for Lagrange (in three dimensions)
Suppose f has a local maximum at P on the constraint surface.
Let r(t) = (x(t), y(t), z(t)) be an arbitrary parametrized curve which lies on the constraint
surface and has (x(0), y(0), z(0)) = P . Finally, let h(t) = f (x(t), y(t), z(t)). The setup
guarantees that h(t) has a maximum at t = 0.
Taking a derivative using the chain rule in vector form gives
h' (t) = Vf |r(t) · r' (t).
Since t = 0 is a local maximum, we have
h' (0) = Vf |P · r' (0) = 0.
Thus, Vf |P is perpendicular to any curve on the constraint surface through P.

This implies Vf |P is perpendicular to the surface. Since Vg|P is also perpendicular to the
surface we have proved Vf |P is parallel to Vg|P . QED
2
MIT OpenCourseWare

Fall 2010
Non-independent Variables
1. We give a worked example here. A fuller explanation will be given in the next session.
Let
w = x3 y 2 + x2 y 3 + y
and assume x and y satisfy the relation
x2 + y 2 = 1.
We consider x to be the independent variable, then, because y depends on x we have w is
ultimately a function of the single variable x.
dw
a) Compute using implicit differentiation.
dx
dw
b) Compute using total differentials.
dx
Answer:
dy 2 dy
a) Implicit differentiation means remembering that y is a function of x, e.g., = 2y .
dx dx
Thus,
dw dy dy dy
= 3x2 y 2 + 2x3 y + 2xy 3 + 3x2 y 2 + .
dx dx dx dx
dy
Now we differentiate the constraint to find .
dx
dy dy x
x2 + y 2 = 1 ⇒ 2x + 2y =0 ⇒ =− .
dx dx y
dw
Substituting this in the equation for gives
dx
dw x x x x
= 3x2 y 2 − 2x3 y + 2xy 3 − 3x2 y 2 − = 3x2 y 2 − 2x4 + 2xy 3 − 3x3 y − .
dx y y y y
b) Taking total differentials of both w and the constraint equation gives

dw = 3x2 y 2 dx + 2x3 y dy + 2xy 3 dx + 3x2 y 2 dy + dy
= (3x2 y 2 + 2xy 3 ) dx + (2x3 y + 3x2 y 2 + 1) dy
2x dx + 2y dy = 0.
We can solve the second equation for dy and substitute in the equation for dw.
x
dy = − dx ⇒
y
x
dw = (3x2 y 2 + 2xy 3 ) dx + (2x3 y + 3x2 y 2 + 1) − dx
y
x
= (3x2 y 2 − 2x4 + 2xy 3 − 3x3 y − ) dx
y
Thus,
dw x
= 3x2 y 2 − 2x4 + 2xy 3 − 3x3 y − .
dx y
MIT OpenCourseWare

Fall 2010
Non-independent Variables
3. Abstract partial differentiation; rules relating partial derivatives

Often in applications, the function w is not given explicitly, nor are the equations con
necting the variables. Thus you need to be able to work with functions and equations just
given abstractly. The previous ideas work perfectly well, as we will illustrate. However, we
will need (as in section 2) to distinguish between
formal partial derivatives, written here fx , fy , . . . (calculated as if all the variables were
independent), and
actual partial derivatives, written ∂f /∂x, . . . , which take account of any relations between
the variables.
Example 5. If f (x, y, z) = xy 2 z 4 , where z = 2x + 3y, the three formal derivatives are

fx = y 2 z 4 , fy = 2xyz 4 , fz = 4xy 2 z 3 ,
while three of the many possible actual partial derivatives are (we use the chain rule)
� � � �
∂f ∂z
= fx + fz = y 2 z 4 + 8xy 2 z 3 ;
∂x y ∂x y
� � � �
∂f ∂z
= fy + fz = 2xyz 4 + 12xy 2 z 3 ;
∂y x ∂y x
� � � �
∂f ∂y
= fy + fz = 23 xyz 4 + 4xy 2 z 3 .
∂z x ∂z x
Rules connecting partial derivatives. These rules are widely used in the applications,
especially in thermodynamics. Here we will use them as an excuse for further practice with
the chain rule and differentials.
With an eye to thermodynamics, we assume a set of variables t, u, v, w, x, y, z, . . . con
nected by several equations in such a way that
• any two are independent;
• any three are connected by an equation.
Thus, one can choose any two of them to be the independent variables, and then each of
the other variables can be expressed in terms of these two.
We give each rule in two forms—the second form is the one ordinarily used, while the
first is easier to remember. (The first two rules are fairly simple in either form.)
� � � � � �
∂x ∂y ∂x 1
(8a,b) = 1 = reciprocal rule
∂y z ∂x z ∂y z (∂y/∂x)z
� � � � � � � �
∂x ∂y ∂x ∂x (∂x/∂t)z
(9a,b) = = , chain rule
∂y z ∂t z ∂t z ∂y z (∂y/∂t)z
� � � � � � � �
∂x ∂y ∂z ∂x (∂x/∂z)y
(10a,b) = −1 = − , cyclic rule
∂y z ∂z x ∂x y ∂y z (∂y/∂z)x
Note how the successive factors in the cyclic rule are formed: the variables are used in the
successive orders x, y, z; y, z, x; z, x, y; one says they are permuted cyclically, and this
explains the name.
1
2 NON-INDEPENDENT VARIABLES
Proof of the rules. The first two rules are simple: since z is being held fixed throughout,
each variable becomes a function of just one other variable, and (9) is just the one-variable
chain rule. Then (8) is just the special case of (9) where x = t.
The cyclic rule is less obvious — on the right side it looks almost like the chain rule, but
different variables are being held constant in each of the differentiations, and this changes it
entirely. To prove it, we suppose f (x, y, z) = 0 is the equation satisfied by x, y, z; taking y
and z as the independent variables and differentiating f (x, y, z) = 0 with respect to y gives:
� � � �
∂x ∂x fy
(11) fx + fy = 0; therefore = − .
∂y z ∂y z fx
Permuting the variables in (11) and multiplying the resulting three equations gives (10a):
� � � � � �
∂x ∂y ∂z fy fz fx
= − ·− ·− = −1.
∂y z ∂z x ∂x y fx fy fz
� �
∂w
Example 6. Suppose w = w(x, r), with r = r(x, θ). Give an expression for in
∂r θ
terms of formal partial derivatives of w and r.
Solution. Evidently the independent variables are to be r and θ, since these are the
ones that occur in the lower part of the partial derivative, with x dependent on them. Since
θ is viewed as a constant, the chain rule gives
� � � �
∂w ∂x
= wx + wr ;
∂r θ ∂r θ
� �
∂x 1
= ,
∂r θ (∂r/∂x)θ
by the reciprocal rule (8). and therefore finally,

� �
∂w wx
= + wr .
∂r θ rx
MIT OpenCourseWare

Fall 2010
18.02 Problem Set 4
At MIT problem sets are referred to as ’psets’. You will see this term used occasionally
within the problems sets.
The 18.02 psets are split into two parts ’part I’ and ’part II’. The part I are all taken
from the supplementary problems. You will find a link to the supplementary problems
and solutions on this website. The intention is that these help the student develop
some fluency with concepts and techniques. Students have access to the solutions
while they do the problems, so they can check their work or get a little help as they
do the problems. After you finish the problems go back and redo the ones for which
you needed help from the solutions.
The part II problems are more involved. At MIT the students do not have access
to the solutions while they work on the problems. They are encouraged to work
together, but they have to write their solutions independently.
Part I (15 points)
At MIT the underlined problems must be done and turned in for grading.
The ‘Others’ are some suggested choices for more practice.
A listing like ’§1B : 2, 5b, 10’ means do the indicated problems from supplementary
problems section 1B.
1 Functions of several variables. Level Curves. Partial derivatives. Tangent plane.
Linear Approximation
§2A: 1c, 2be, 3b, 5a; Others: 1abd, 2acd, 3ac, 4, 5b
§2B: 1b, 6, 9; Others: 1a, 2, 3, 5
2 Max-min problems. Least Squares.

§2F: 1a, 2; Others: 1b, 5;
§2G: 1c, 4; Others: 1ab
3 Second-derivative test. Boundaries. Applications

§2H: 1c,3, 4, 6; Others: 1abe, 7
1
Part II (29 points)
Problem 1 (5: 3,2)

In this problem we’ll use the ‘Mathlet’ Functions of Two Variables (with link on
course webpage)
Let f (u) be a function of one variable and let Fc (x, t) = f (x − ct).
a) Use the plotting feature of the mathlet to examine the graphs of Fc in the cases
c = −1, 1, 2 for f (u) = (cos(u))2
Note: you will need to use y for t when using Input to type the functions into the
applet, since that’s the only other letter allowed.
Then for c = 2 make a sketch by hand of the graphs of the single-variable function
x → F2 (x, t) for the values t = −1, 0, 1.
b) If the variable t represents time, and x the position on a one-dimensional string,
and f measures the displacement of the string perpendicular to the x-axis, what
phenomenon is being described by the function F2 (x, t) ?
Problem 2 (5: 2,2,1)

a) Find the curve of intersection of the surfaces z = x2 − y 2 and z = 2 + (x − y)2
in parametric form.
b) Find the angle of intersection of these two surfaces at the point (2, 1, 3).
(The angle of intersection of two surfaces is defined to be the angle made by their
tangent planes.)
c) Check that the tangent vector to the curve of intersection found in part(a) at the
point (2, 1, 3) lies in (i.e. is parallel to) the tangent plane of each of the two surfaces.
Problem 3 (7: 1,1,1,2,2)

4
Let f (x, y) = and let S be the surface given by the graph of f (x, y).
1 + x2 + y 2
a) Make a sketch of the surface S in 3-space, and also a (separate) sketch of the
contour plot of f .
b) Let C2 denote the curve in the xy-plane given by r2 (t) = (t, 32 − t2 ); and let C
denote the curve on the surface S which has C2 as its shadow in the xy-plane. Find
the parametric equations r = r(t) for C.
(c) Sketch C in on the picture of S of part(a), and also C2 on the contour plot of
part(a).
d) Let z(t) denote the z-component of the parametric equations r = r(t) of C found
in part(b). Find the points where z(t) has its local maxima and minima, and add
these in to the sketch of part(a).
e) Set up the function h(t) which gives the square of the distance from the origin to
a variable point on the curve C2 , and then find the local maxima and minima of h(t).
How do these points relate to the maxima and minima found in part(d), and why?
2
Problem 4 (5)
Suppose that three non-negative numbers are restricted by the condition that the
sum of their squares is equal to 27. Using critical point analysis, with 2nd derivative
and/or boundary tests as needed, find the maximum and minimum values of the sum
of their cubes.
Problem 5 (7: 4,3)

We return to problem 3 on p-set 1, where we computed the component of the wind
vector w which pointed against the wind after projecting w twice.
The problem now is to find the combination of projections of w = (1, 0) onto the
vectors w1 and then onto w2 which yields the second projection which points most
strongly into the wind, that is, the one with the most negative i component.
In p-set 1, problem 3 we found the i component of w2 was
cos(α) cos(β) cos(α + β).
a) What choice(s) of α and β will give the most negative i component of w2 ?

b) What fraction of the initial force of the wind is this?
This is the fraction of the force of the wind a sailboat can use to go against the wind
with tacking.
Suggestion for Mathlet Experiments

Here are some suggested experiments to familiarize yourself with the Level Curves and
Partial Derivatives selections in the applet Functions of Two Variables. These
are optional, but people often find them useful for strengthening 3D visualization.
With the function f (x, y) = y 2 − x2 start to familiarize yourself with the applet.
Try out different windows, in addition to [−2, 2] × [−2, 2], to get a better picture
of the part of the graph around the saddle point. Read the directions given, and
then try out the different input and mouse-controlled features to see how they work.
(Note also that the surface graph may be rotated in different directions to get a better
view of the features of interest.) Also try these out on the given default example
f (x, y) = x(x − 1)(x − 2) + (y − 1)(x − y) for more practice. Then:
Using the Partial Derivatives applet for the function f (x, y) = y 2 − x2 describe
the behavior of fx and fy when you start at the saddle point (0,0) on the contour
plot and then move the point in the following directions: E, NE, N, NW, W, SW,
S, and SE (where e.g. SE means along the ray θ = − π4 , etc.). Check the geometric
pictures against the numerical results. If an intrepid mountaineer were moving along
this surface, what would the partial derivative information be telling you about the
rate of climb/descent on the mountain in the x- and y-directions for the E, N, W, S
routes on the mountain?
3
MIT OpenCourseWare

Fall 2010
2. Partial Differentiation
2A. Functions and Partial Derivatives
2A-1 Sketch five level curves for each of the following functions. Also, for a-d, sketch the
portion of the graph of the function lying in the first octant; include in your sketch the
traces of the graph in the three coordinate planes, if possible.
�
a) 1 − x − y b) x2 + y 2 c) x2 + y 2 d) 1 − x2 − y 2 e) x2 − y 2
2A-2 Calculate the first partial derivatives of each of the following functions:
x 2
a) w = x3 y − 3xy 2 + 2y 2 b) z = c) sin(3x + 2y) d) ex y
y
e) z = x ln(2x + y) f) x2 z − 2yz 3
2A-3 Verify that fxy = fyx for each of the following:

x
a) xm y n , (m, n positive integers) b) c) cos(x2 + y)
x+y
d) f (x)g(y), for any differentiable f and g
2A-4 By using fxy = fyx , tell for what value of the constant a there exists a function
f (x, y) for which fx = axy + 3y 2 , fy = x2 + 6xy, and then using this value, find such a
function by inspection.
2A-5 Show the following functions w = f (x, y) satisfy the equation wxx + wyy = 0 (called
the two-dimensional Laplace equation):
a) w = eax sin ay (a constant) b) w = ln(x2 + y 2 )
2B. Tangent Plane; Linear Approximation

2B-1 Give the equation of the tangent plane to each of these surfaces at the point indicated.
a) z = xy 2 , (1, 1, 1) b) w = y 2 /x, (1, 2, 4)
�
2B-2 a) Find the equation of the tangent plane to the cone z = x2 + y 2 at the point
P0 : (x0 , y0 , z0 ) on the cone.
b) Write parametric equations for the ray from the origin passing through P0 , and
using them, show the ray lies on both the cone and the tangent plane at P0 .
2B-3 Using the approximation formula, find the approximate change in the hypotenuse of
a right triangle, if the legs, initially of length 3 and 4, are each increased by .010 .
2B-4 The combined resistance R of two wires in parallel, having resistances R1 and R2
respectively, is given by
1 1 1
= + .
R R1 R2
If the resistance in the wires are initially 1 and 2 ohms, with a possible error in each
of ±.1 ohm, what is the value of R, and by how much might this be in error? (Use the
approximation formula.)
2B-5 Give the linearizations of each of the following functions at the indicated points:
a) (x + y + 2)2 at (0, 0); at (1, 2) b) ex cos y at (0, 0); at (0, π/2)
1
2 E. 18.02 EXERCISES
2B-6 To determine the volume of a cylinder of radius around 2 and height around 3, about
how accurately should the radius and height be measured for the error in the calculated
volume not to exceed .1 ?
2B-7 a) If x and y are known to within .01, with what accuracy can the polar coordinates
r and θ be calculated? Assume x = 3, y = 4.
b) At this point, are r and θ more sensitive to small changes in x or in y? Draw a
picture showing x, y, r, θ and confirm your results by using geometric intuition.
2B-8* Two sides of a triangle are a and b, and θ is the included angle. The third side is c.
a) Give the approximation for Δc in terms of a, b, c, θ, and Δa, Δb, Δθ.
b) If a = 1, b = 2, θ = π/3, is c more sensitive to small changes in a or b?
2B-9 a) Around the point (1, 0), is w = x2 (y + 1) more sensitive to changes in x or in y?

b) What should the ratio of Δy to Δx be in order that small changes with this ratio
produce no change in w, i.e., no first-order change — of course w will change a little, but
like (Δx)2 , not like Δx.
� �
�a b�
2B-10* a) If |a| is much larger than |b|, |c|, and |d|, to which entry is the value of �
� �
c d�
most sensitive?
b) Given a 3 × 3 determinant, how would you determine to which entry the value
of the determinant is most sensitive? (Consider the various Laplace expansions by the
cofactors of a given row or column.)
2C. Differentials; Approximations

2C-1 Find the differential (dw or dz). Make the answer look as neat as possible.
x−y u √
a) w = ln(xyz) b) w = x3 y 2 z c) z = d) w = sin−1 (use t2 − u2 )
x+y t
2C-2 The dimensions of a rectangular box are 5, 10, and 20 cm., with a possible measure
ment error in each side of ±.1 cm. Use differentials to find what possible error should be
attached to its volume.
2C-3 Two sides of a triangle have lengths respectively a and b, with θ the included angle.
Let A be the area of the triangle.
a) Express dA in terms of the variables and their differentials.
b) If a = 1, b = 2, θ = π/6, to which variable is A most sensisitve? least sensitive?
c) Using the values in (b), if the possible error in each value is .02, what is the possible
error in A, to two decimal places?
2C-4 The pressure, volume, and temperature of an ideal gas confined to a container are
related by the equation P V = kT , where k is a constant depending on the amount of gas
and the units. Calculate dP two ways:
a) Express P in terms of V and T , and calculate dP as usual.
b) Calculate the differential of both sides of the equation, getting a “differential equa
tion”, and then solve it algebraically for dP .
c) Show the two answers agree.
2. PARTIAL DIFFERENTIATION 3
2C-5 The following equations define w implicitly as a function of the other variables.
Find dw in terms of all the variables by taking the differential of both sides and solving
algebraically for dw.
1 1 1 1
a) = + + b) u2 + 2v 2 + 3w2 = 10
w t u v
2D. Gradient and Directional Derivative

2D-1 In each of the following, a function f , a point P , and a vector� A are given. Calculate
df ��
the gradient of f at the point, and the directional derivative at the point, in the
ds �u
direction u of the given vector A.
xy
a) x3 + 2y 3 ; (1, 1), i − j b) w = ; (2, −1, 1), i + 2 j − 2 k
z
c) z = x sin y + y cos x; (0, π/2), −3 i + 4 j d) w = ln(2t + 3u); (−1, 1), 4 i − 3 j
e) f (u, v, w) = (u + 2v + 3w)2 ; (1, −1, 1), −2 i + 2 j − k
2D-2 For the following functions, each with a given point

� P,
df ��
(i) find the maximum and minimum values of , as u varies;
ds �u
(ii) tell for which directions the maximum � and minimum occur;
df ��
(iii) find the direction(s) u for which = 0.
ds � u
a) w = ln(4x − 3y), (1, 1) b) w = xy + yz + xz, (1, −1, 2)

c) z = sin2 (t − u), (π/4, 0)
2D-3 By viewing the following surfaces as a contour surface of a function f (x, y, z), find
its tangent plane at the given point.
a) xy 2 z 3 = 12, (3, 2, 1); b) the ellipsoid x2 + 4y 2 + 9z 2 = 14, (1, 1, 1)

2 2 2
c) the cone x + y − z = 0, (x0 , y0 , z0 ) (simplify your answer)
2D-4 The function T = ln(x2 +y 2 ) gives the temperature at each point in the plane (except
(0, 0)).
a) At the point P : (1, 2), in which direction should you go to get the most rapid increase
in T ?
b) At P , about how far should you go in the direction found in part (a) to get an increase
of .20 in T ?
c) At P , approximately how far should you go in the direction of i + j to get an increase
of about .12?
d) At P , in which direction(s) will the rate of change of temperature be 0?
2D-5 The function T = x2 + 2y 2 + 2z 2 gives the temperature at each point in space.

a) What shape are the isotherms?.
b) At the point P : (1, 1, 1), in which direction should you go to get the most rapid
decrease in T ?
c) At P , about how far should you go in the direction of part (b) to get a decrease of 1.2
in T ?
d) At P , approximately how far should you go in the direction of i − 2 j + 2 k to get an
increase of .10?
� � �
d(uv) �� dv �� du ��
2D-6 Show that ∇(uv) = u∇v + v∇u, and deduce that = u � +v .
ds �u ds u ds �u
(Assume that u and v are functions of two variables.)
� �
dw �� dw �� i+j i−j
2D-7 Suppose = 2, = 1 at P , where u = √ , v = √ . Find (∇w)P .
ds �u ds �v 2 2
(This illustrates that the gradient can be calculated knowing the directional derivatives
in any two non-parallel directions, not just the two standard directions i and j .)
2D-8 The atmospheric pressure in a region of space near the origin is given by the formula
P = 30 + (x + 1)(y + 2)ez . Approximately where is the point closest to the origin at which
the pressure is 31.1?
2D-9 The accompanying picture shows the level curves of a function w = f (x, y). The
value of w on each curve is marked. A unit distance is given.
P
a) Draw in the gradient vector at A. 1
b) Find a point B where w = 3 and ∂w/∂x = 0. 2 4 5
3
c) Find a point C where w = 3 and ∂w/∂y = 0.
Q
d) At the point P estimate the value of ∂w/∂x and ∂w/∂y.
e) At the point Q, estimate dw/ds in the direction of i + j
A
f) At the point Q, estimate dw/ds in the direction of i − j .
g) Approximately where is the gradient 0? 1
2E. Chain Rule

df
2E-1 In the following, find for the composite function f (x(t), y(t), z(t)) in two ways:
dt
(i) use the chain rule, then express your answer in terms of t by using x = x(t), etc.;
(ii) express the composite function f in terms of t, and differentiate.
a) w = xyz, x = t, y = t2 , z = t3 b) w = x2 − y 2 , x = cos t, y = sin t
2 2
c) w = ln(u + v ), u = 2 cos t, v = 2 sin t
2E-2 In each of these, information about the gradient of an unknown function f (x, y) is
given; x and y are in turn functions of�t. Use the� chain rule to find out additional information
about the composite function w = f x(t), y(t) , without trying to determine f explicitly.
dw
a) ∇w = 2 i + 3 j at P : (1, 0); x = cos t, y = sin t. Find the value of at t = 0.
dt
dw
b) ∇w = y i + x j ; x = cos t, y = sin t. Find and tell for what t-values it is zero.
dt
df
c) ∇f = h1, −1, 2i at (1, 1, 1). Let x = t, y = t2 , z = t3 ; find at t = 1.
dt
df
d) ∇f = h3x2 y, x3 + z, yi; x = t, y = t2 , z = t3 . Find .
dt
2E-3 a) Use the chain rule for f (u, v), where u = u(t), v = v(t), to prove the product rule
d
D(uv) = v Du + u Dv, where D = .
dt
d
b) Using the chain rule for f (u, v, w), derive a similar product rule for (uvw), and use
dt
2t
it to differentiate te sin t.
d v
c)* Derive similarly a rule for the derivative u , and use it to differentiate (ln t)t .
dt
2E-4 Let w = f (x, y), and assume that ∇w = 2 i + 3 j at (0, 1). If x = u2 − v 2 , y = uv,
∂w ∂w
find , at u = 1, v = 1.
∂u ∂v
2E-5 Let w = f (x, y), and suppose we change from rectangular to polar coordinates:
x = r cos θ, y = r sin θ.
1
a) Show that (wx )2 + (wy )2 = (wr )2 + 2 (wθ )2 .
r
∂w ∂w
b) Suppose ∇w = 2 i − j at the point x = 1, y = 1. Find and when
√ ∂r ∂θ
r = 2, θ = π/4, and verify the relation in part (a), at the point.
2E-6 Let w = f (x, y), and make the change of variables x = u2 − v 2 , y = 2uv. Show
(wu )2 + (wv )2
(wx )2 + (wy )2 =
4(u2 + v 2 )
2E-7 The Jacobian
� matrix
� for the change of variables x = x(u, v), y = y(u, v) is defined
xu xv
to be J = . Let ∇f (x, y) be represented as the row vector hfx fy i.
yu yv
Show that � �
∇f x(u, v), y(u, v) = ∇f (x, y) · J (matrix multiplication).
2E-8 a) Let w = f (y/x); i.e., w is the composite of the functions w = f (u), u = y/x.
∂w ∂w
Show that w satisfies the PDE (partial differential equation) x + y = 0.
∂x ∂y
∂w ∂w
b)* Let w = f (x2 − y 2 ); show that w satisfies the PDE y + x = 0.
∂x ∂y
∂w ∂w
c)* Let w = f (ax + by); show that w satisfies the PDE b − a = 0.
∂x ∂y
2F. Maximum-minimum Problems
2F-1 Find the point(s) on each of the following surfaces which is closest to the origin.
(Hint: it’s easier to minimize the square of the distance, rather than the distance itself.)
a) xyz 2 = 1 b) x2 − yz = 1
2F-2 A rectangular produce box is to be made of cardboard; the sides of single thickness,
the front and back of double thickness, and the bottom of triple thickness, with the top
left open. Its volume is to be 1 cubic foot; what proportions for the sides will use the least
cardboard?
2F-3* Consider all planes passing through the point (2,1,1) and such that the intercepts
on the three coordinate axes are all positive. For which of these planes is the product of the
three intercepts smallest? (Hint: take the plane in the form z = ax + by + c, where a and b
are the independent variables.)
2F-4* Find the extremal point of x2 + 2xy + 4y 2 + 6 and show it is a minimum point by
completing the square.
2F-5 A drawer in a chest has an open top; the bottom and back are made of cheap wood
costing $1/sq. ft; the sides have to be thicker, and cost $2/sq.ft., while the front costs
$4/sq.ft. for the better quality wood and finishing. The volume is to be 2.5 cu. ft. What
dimensions will produce the drawer costing the least to manufacture?
2G. Least-squares Interpolation
2G-1 Find by the method of least squares the line which best fits the three data points
given. Do it from scratch, using the formula
� n
D= (yi − (axi + b))2 ,
i=1
which was in the reading on least squares, and differentiation (use the chain rule). Sketch
the line and the three points as a check.
a)* (0, 0), (0, 2), (1, 3) b)* (0, 0), (1, 2), (2, 1) c) (1, 1), (2, 3), (3, 2)
2G-2* Show that the equations (4) for the method of least squares have a unique solution,
unless all the xi are equal. Explain geometrically why this exception occurs.
�n
Hint: use the fact that for all values of u, we have 1 (xi − u)2 ≥ 0, since
squares are always non-negative. Write the left side as a quadratic polynomial in
u. Usually it has no roots. What does this imply about the coefficients? When
does it have a root? (Answer these two questions by using the quadratic formula.)
2G-3* Use least squares to fit a second degree polynomial exactly through the points
(−1, −1), (0, 0), (1, 3) (you might want to go back and read the last section in the note
about least squares).
2G-4 What linear equations in a, b, c does the method of least squares lead to, when you
use it to fit a linear function z = a + bx + cy to a set of data points (xi , yi , zi ), i = 1, . . . , n?
2G-5* What equations are you led to for determining a when you try to fit the exponential
curve y = eax to two data points (1, y1 ), (2, y2 ) by the method of least squares?
The moral is: don’t do it this way. In general to fit an exponential y = ceax
to a set of data points (xi , yi ), take the log of both sides:
ln y = a x + ln c
This gives a linear function in the variables x and ln y, whose coefficients a and
ln c can be determined by applying the method of least squares to fit the data
points (xi , ln yi ).
2H. Max-min: 2nd Derivative Criterion; Boundary Curves

2H-1 For each of the following functions, find the critical points, and classify them using
the 2nd-derivative criterion.
a) x2 − xy − 2y 2 − 3x − 3y + 1 b) 3x2 + xy + y 2 − x − 2y + 4 c) 2x4 + y 2 − xy + 1
d) x3 − 3xy + y 3 e) (x3 + 1)(y 3 + 1)
2H-2* Use the 2nd-derivative criterion to verify that the critical point (m0 , b0 ) determining
the regression (= least-squares) line y = m0 x + b0 really minimizes the function D(m, b)
giving the sum of the squares of the deviations. (You will need the inequality in problem
��
a2i and A · B =
�
1B-15, for n-vectors A = ha1 , a2 , . . . , an i, defining |A| = a i bi . )
2H-3 Find the maximum and minimum of the function f (x, y) = x2 + y 2 + 2x + 4y − 1

in the right half-plane R bounded by the diagonal line y = −x.
2H-4 Find the maximum and minimum points of the function xy − x − y + 2 on

a) the first quadrant b) the square R : 0 ≤ x ≤ 2, 0 ≤ y ≤ 2;
(study its values at the unique critical point and on the boundary lines) point.
c) use the data to guess the critical point type, and confirm it by the second derivative
test.
√
2H-5 Find the maximum and minimum points of the function f (x, y) = x + 3 y + 2 on
the unit disc R : x2 + y 2 ≤ 1.
2H-6 a) Two wires of length 4 are cut in the same way into three pieces, of length x, y
and z; the four x, y pieces are used as the four sides of a rectangle; the two z pieces are bent
at the middle and joined at the ends to make a square of side z/2. Find the rectangle and
square made this way which together have the largest and the smallest total area.
Using the answer, tell what type the critical point is.
b) Confirm the critical point type by using the second derivative test.
2H-7 a) Find the maximum and minimum points of the function 2x2 − 2xy + y 2 − 2x
on the rectangle R : 0 ≤ x ≤ 2; −1 ≤ y ≤ 2; using this information, determine the type of
the critical point.
b) Confirm the critical point type by using the second derivative test.
2I. Lagrange Multipliers

2I-1 A rectangular box is placed in the first octant so that one corner Q is at the origin and
the three sides adjacent to Q lie in the coordinate planes. The corner P diagonally opposite
Q lies on the surface f (x, y, z) = c. Using Lagrange multipliers, tell for which point P the
box will have the largest volume, and tell how you know it gives a maximum point, if the
surface is
a) the plane x + 2y + 3z = 18 b) the ellipsoid x2 + 2y 2 + 4z 2 = 12.
2I-2 Using √ Lagrange multipliers, tell which point P in the first octant and on the surface
x3 y 2 z = 6 3 is closest to the origin. (As usual, it is easier algebraically to minimize |OP |2
rather than |OP |.)
2I-3 (Repeat of 2F-2, but this time use Lagrange multipliers.) A rectangular produce box
is to be made of cardboard; the sides of single thickness, the ends of double thickness, and
the bottom of triple thickness, with the top left open. Its volume is to be 1 cubic foot; what
should be its proportions in order to use the least cardboard?
2I-4 In an open-top wooden drawer, the two sides and back cost $2/sq.ft., the bottom
$1/sq.ft. and the front $4/sq.ft. Using Lagrange multipliers, show that the following prob
lems lead to the same set of three equations in λ, plus a different fourth equation, and they
have the same solution.
a) Find the dimensions of the drawer with largest capacity that can be made for a total
wood cost of $72.
b) Find the dimensions of the most economical drawer having volume 24 cu. ft.
2J. Non-independent Variables

All references are to the Examples and numbered equations in Notes N.
� � � �
∂w ∂w
2J-1 In Example 1, calculate by direct substitution: a) b) .
∂y z ∂z y
2J-2 Calculate the two derivatives in 2J-1 by using

(i) the chain rule (differentiate z = x2 + y 2 implicitly) (ii) differentials
2J-3 In Example 2, using the chain rule calculate, in terms of x, y, z, t, the derivatives
� � � �
∂w ∂w
a) b)
∂t x,z ∂z x,y
2J-4 Repeat 2J-3, doing the calculation using differentials.
2J-5 Let S = S(p, v, T ) be the entropy of a gas, assumed to obey the ideal gas law (1).
Give expressions in terms of the
� formal
� partial derivatives
� Sp ,�Sv , and ST for
∂S ∂S
a) b)
∂p v ∂T v
� � � �
3 2 ∂w ∂w
2J-6 If w = u − uv , u = xy, v = u + x, find and using
∂u x ∂x u
a) the chain rule b) differentials .
2J-7 Let P be the point (1, −1, 1), and assume z = x2 + y + 1, and that f (x, y, z) is a
differentiable function for which ∇f (x, y, z) = 2 i + j − 3 k at P .
Let g(x, z) = f (x, y(x, z), z); find ∇g at the point (1, 1), i.e., x = 1, z = 1.
√
2J-8 Interpreting r, θ as polar coordinates, let w = r2 − x2 .
� �
∂w
a) Calculate , by first writing w in terms of r and θ.
∂r θ
r
w
b)* Then calculate it by substituting into the final formula given θ
in Example 6. x
c)* Finally, obtain the answer by intuitive geometrical reasoning (see picture).
2K. Partial Differential Equations

�
2K-1. Show that w = ln r , where r = x2 + y 2 is the usual polar coordinate, satisfies
the two-dimensional Laplace equation (Notes P (1), without z), if (x, y) =
6 (0, 0). What’s
wrong with (0, 0)?
√
(The calculation will go faster if you remember that ln a = 21 ln a.)
2K-2. For what value(s) of n will w = (x2 + y 2 + z 2 )n solve the 3-dimensional Laplace
equation (Notes P, (1))? Where have you seen this function in physics?
2K-3. The solutions in exercises 2K-1 and 2K-2 have circular and spherical symmetry,
respectively. But there are many other solutions. For example
a) find all solutions of the two-dimensional Laplace equation (see 2K-1) of the form
w = ax2 + bxy + cy 2
and show they can all be written in the form c1 f1 (x, y) + c2 f2 (x, y), where c1 , c2 are
arbitrary constants, and f1 , f2 are two particular polynomials — that is, all such solutions
are linear combinations of two particular polynomial solutions.
b)* Find and derive the analogue of part (a) for all of the cubic polynomial solutions
ax3 + bx2 y + cxy 2 + dy 3 to the two-dimensional Laplace equation.
2K-4. Show that the one-dimensional wave equation (Notes P, (4), first equation) is
satisfied by any function of the form
w = f (x + ct) + g(x − ct) ,
where f (u) and g(u) are arbitrary twice-differentiable functions of one variable.
Take g(u) = 0, and interpret physically the solution w = f (x + ct). What does f (x)
represent? What is the relation of f (x + ct) to it?
Note how this exercise shows that a solution to the wave equation can in
volve completely arbitrary functions; this is also clear from the remarks about
the Laplace equation being solved by any gravitational or electrostatic potential
function in a mass- or charge-free region of space.
2K-5. Find solutions to the one-dimensional heat equation (Notes P, (5), first equation)
having the form
w = ert sin kx k, r constants
satisfying the additional conditions for all t:
w(0, t) = 0, w(1, t) = 0 .
Interpret your solutions physically. What happens to the temperature as t → ∞?

MIT OpenCourseWare

Fall 2010
18.02 Problem Set 5
Part I (15 points)
1 Differentials. Chain rule
§2C: 1ad, 2, 3, 5ab Others: 1bc
§2E: 1c, 2bc, 8a; Others: 1ab, 2d, 4, 5, 7
2 Gradient and directional derivatives

§2D: 1ae, 2b, 3a, 8, 9; Others: 1bc, 2a, 3b, 4, 5
§2E: 7
Part II (17 points)
Problem 1 (4: 1,2,1 )

In laminar flow in a cylinder (for example, blood flow in a vein or artery), the resis
tance R to the flow is related to the length w and radius r of the cylinder by the law
w
of Poiseuille: R = k 4 for some constant k.
r
a) Compute the linear approximation dR to the change in R, in terms of the changes
in w and r.
dR
b) Compute the linear approximation to the relative change in R in terms of
R
dw dr
= the relative change in w and = the relative change in r.
w r
c) For relative changes in w and r of about the same sizes, which variable contributes
more to the relative change in R ? Also, in order to produce the greatest relative
change in R, should the changes in w and r both have the same sign or opposite signs
(and why)?
Problem 2 (3)
Let f (x, y, z, t) be a smooth function, and let �f = �fx , fy , fz � be the gradient in the
space variables only. Let r = r(t) = �x(t), y(t), z(t)� be a smooth curve, and v = r� (t);
Df d
and suppose we use the notation = f (r(t), t).
Dt dt
Df ∂f
Use the Chain Rule to show that = + v · �f .
Dt ∂t
D
Background : The notation Dt comes from the physics of fluid motion, where it is
called the convective derivative (or material or substantial derivative, and by several
other names), and means the rate of change along a moving path of some physical
quantity (scalar or vector) which is being transported by fluid currents.
In this macroscopic model, the fluid is pictured as a continuum of point masses rather
than as individual molecules. At a location (x, y, z) in space and a time t, the point
mass has a density ρ = ρ(x, y, z, t), and a velocity v = v(x, y, z, t). This means
that the vector v(x, y, z, t) points in direction tangent to the path of a particle at
(x, y, z, t) in the flow, and has magnitude equal to the instantaneous speed of the
particle located at that point and which is moving in the flow.
Now suppose that the curve r = r(t) is a path of a point mass in the flow, so that
(by definition) r� (t) = v(r(t), t). The convective derivative Df Dt
of f along this path is
the time rate of change of f using only the values of f (x, y, z, t) for which the space
variables (x, y, z) are restricted to the path r(t) = �x(t), y(t), z(t)� of a particle in the
flow. For this reason you will see the convective derivative described as the rate of
change of the quantity f “moving along the flow” or “moving with an element of the
fluid” (and other similar language).
Problem 3 (5: 1,2,2) (continuation)
Now take the case f = ρ, the density of the fluid. A fluid flow is called incompressible
Dρ
if = 0.
Dt
As discussed above, this means that the mass density is constant along the paths of
the flow. Any substance (like water, at moderate pressures) which has the property
that its density is constant in all variables (x, y, z, t) will of course be incompressible,
which is the usual way one pictures something which cannot be compressed. However,
incompressibility is in general a property of the flow rather than just the fluid itself,
since it says only that the rate of change of the density moving along the flow is zero.
The following examples illustrate this.
a) Suppose that the density function depends only on time t but is constant in the
space variables (x, y, z), that is, ρ = ρ(t). Then show that the flow is incompressible
if and only if the density ρ(t) is constant in all the variables (x, y, z, t) (that is, the
constant-density case discussed above).
b) Next suppose instead that the density depends only on the space variables (x, y, z)
but not (explicity) on t, so that ρ = ρ(x, y, z). An incompressible flow in this case is
called stratified.
Use the result of problem 2 to give the condition on ρ and v for stratified flow.
A flow is called steady if the density ρ and the velocity field v of the flow do not
depend explicitly on the time t, i.e. ρ = ρ(x, y, z) and v = v(x, y, z). In this case,
the term streamlines is used for the paths of the particles in the flow, since they keep
their same shapes over time.
c) Suppose one has a 2D stratified steady flow, so that ρ = ρ(x, y) and v = v(x, y),
and suppose also that the density varies only with the height y. Draw a picture of
the streamlines for such as flow. Then explain why they must follow this pattern,
and why the term “stratified” fits in this case.
(This could be, for example, a cross-section of a very regular ocean current, if it is an
incompressible steady flow whose density varies only with the depth.)
Problem 4 (5: 1,1,1,1,1)

For the linear function f (x, y) = 4 − x − 4y :
a) Sketch the portion of the graph in the first octant
b) Compute the gradient of f .
c) Find the point on the level curve f (x, y) = 0 such that the line in the gradient
direction passes through the origin, and then sketch in the gradient at that point.
d) Compute the directional derivative of f in the direction w = −2i − j
e) Sketch in the slope triangles for the rates of change of f in the gradient direction
and in the direction of w.
MIT OpenCourseWare

Fall 2010
18.02 Problem Set 5, Part II Solutions
Problem 1 R = f (r, w) = kwr−4 (k constant)

(a) dR = fr dr + fw dw = k(w(−4r−5 )dr + r−4 dw).
dR
(b) R
= −4 dr
r
+ dw
w
.
dR
(c) R
is more sensitive to dr
r
= relative change in r. Opposite signs in dr
r
,
dw
w
(or in dr and dw, since r, w > 0) will cause errors to add.
Df
Problem 2 Dt
= dtd f (r(t), t) = dtd f (x(t), y(t), z(t), t) =
∂f dx ∂f dy
∂x dt
+ ∂y dt
+ ∂f dz
∂z dt
+ ∂f dt
∂t dt
= r� (t) · �f (r(t)) + ∂f∂t
=v · �f + ∂f
∂t
using
�
v = r (t)
Dρ ∂ρ
Problem 3 = + v · �ρ
Dt ∂t
(a) If ρ = ρ(t) only, then �ρ = �ρx , ρy , ρz � = 0. Thus Dρ
Dt
= 0 if and only if
∂ρ
∂t
= 0.
b) If ∂ρ
∂t
= 0, then DρDt
= 0 if and only if v · �ρ = 0. So the condition
for stratified flow is that the velocity vectors of the flow are orthogonal to
the density gradients, or, equivalently, tangent to the surfaces of constant
density.
c) If ρ = ρ(y) only, then �ρ = �0, ρy �, so that the gradient of the density
is always parallel to j. Therefore, by the result of part(b), the streamlines,
which follow the velocity vectors v, are always horizontal. The flow is thus
layered by density, which is consistent with the meaning of the word stratified.
Problem 4. (a) and (e) – see picture:
1
(b) We compute
�f (x, y) = �fx , fy � = �−1, −4� .
(c) The level curve for f = 0 is given by
x + 4y = 4 .
We are looking for a point (x, y) that lies on the line that passes through the
origin in gradient direction, i.e.,
�x, y� = �0, 0� + s �−1, −4� .
Thus x = −s and y = −4s = 4x. Plugging y = 4x into the level curve for
f = 0 gives
x + 16x = 4 ,
or x = 4/17 and y = 16/17.
(d) The directional derivative is given by
w �−2, −1� 6
�f (x, y) · = �−1, −4� · √ =√ .
|w| 5 5
2
MIT OpenCourseWare

Fall 2010
18.02 Problem Set 6
Part I (10 points)
1 Lagrange multipliers.
§2I: 1, 3; Others: 2, 4
2 Non-independent variables.
§2J: 1, 2, 3a, 4b, 5a, 7; Others: 3b, 4a, 6
Part II (15 points)
Problem 1 (4: 2,1,1)

Go to the ‘Mathlet’ Lagrange Multipliers (with link on the course webpage), and
choose f (x, y) = x2 − y 2 g(x, y) = x2 + y 2 .
a) Solve by hand to find the two values of λ and the possiblities for the corresponding
points (x, y) at which the gradients are proportional. Then check these possibities on
the applet and verify the predicted proportionality on the graph.
b) Now take b = 3 and finish the solution of part(a) by hand to find the possible points
which may give a relative extremum of f . Then return to the applet, set b = 3, move
the f -levels until they makes contact with the g = 3 constraint curve, and read the
values of f at the points of contact. Compare with the results found by hand; how
close could you get?
c) What do the two values of λ correspond to in terms of the pairs of solution points?
Do the gradients of f and g point in the same or the opposite direction at the contact
points in the two different cases, and is this consistent with the signs of λ?
Problem 2 (4: 2,2)

In this problem we examine how electricity flows through circuits to minimize energy.
A current I flowing over a resistor R results in an energy loss (in the form of heat/light)
equal to I 2 R per second. It turns out that, in a sense, ”electricity prefers to flow in
the way that minimizes energy loss to resistance”. For example, when an electric
current comes to a fork, it will divide itself up in such a way that a large portion
of the current flows where the resistance is low and a small portion flows where the
resistance is high (you might think all the electricity would flow where the resistance
is low but the energy loss is proportional to I 2 so it is better to spread the current
around).
Suppose we have the following situation where a current I comes to a pair of resistors
in parallel:
a) Determine what choice of I1 and I2 will minimize energy loss and hence determine
what the currents will be along the two paths. (Alternatively, if you already are
familiar with resistors in parallel and current flows, verify that the currents I1 and I2
do in fact minimize energy loss).
b) Suppose instead we had three resistors in parallel. In terms of R1 ,R2 , and R3
determine the values of I1 , I2 , and I3 which minimize energy loss.
Problem 3 (7: 1,2,2,2)
Using the usual rectangular and polar coordinates, let w be the area of the right
triangle in the first quadrant having its vertices at (0, 0), (x, 0) and (x, y). Using the
equation expressing w in terms of x and y and�the � equations�expressing
� y in terms of
∂w ∂w
x and θ, calculate the two partial derivatives and in three different
∂x θ ∂θ x
ways.
a) Directly, by first expressing w in terms of the independent variables x and θ.

� � � � � �
∂w ∂x ∂y
b) By using the chain rule – for example = wx + wy , where
∂x θ ∂x θ ∂x θ
wx and wy are the formal partial derivatives.
c) By using differentials.
� � � �
Δw Δw
d) Using the triangle picture and geometric intuition, estimate and
Δx θ Δθ x
and show they agree with the two corresponding partial derivatives.
MIT OpenCourseWare

Fall 2010
18.02 Problem Set 6, Part II Solutions
� = 2�x, −y�, g(x, y) = x2 + y 2 , �g

Problem 1 (a) f (x, y) = x2 − y 2 , �f � =
� �
2�x, y�. �f = λ�g ⇒ �x, −y� = λ�x, y�, or x = λx, y = −λy.
� 0 → λ = 1 → y = 0; y �= 0 → λ = −1 → x = 0. So
Two possibilities: x =
� = �g
�f � for all non-zero points on the x-axis (λ = 1) and �f
� = −�g � for
all non-zero points on the y-axis (λ = −1).
x2 + y 2 = 3
(b) g(x, y) = √ √
y = 0, x = ±√3 ≈ 1.732, √ (± 3, 0) ≈ (±1.73, 0) λ = 1.
x = 0, y = ± 3, (0, ± 3) ≈ (0, ±1.732) λ = −1.
√
(c) λ = +1 x-axis contact points f = x2 − y 2 = 3 (y = 0) x = ± 3, the two
gradients point in the same direction (λ > 0).
√
λ = −1 y-axis contact points f = x2 − y 2 = −3 (x = 0) y = ± 3, the two
gradients point in the opposite direction (λ < 0).
Problem 2
a) We want to minimize
I12 R1 + I22 R2
subject to
I1 + I2 = I
where I is a constant. Using Lagrange multipliers we get the equations:
2I1 R1 = λ, 2I2 R2 = λ, I1 + I2 = I
which we solve to get that

R2 R1
I1 = I, I2 = I
R1 + R2 R1 + R2
(If you are familiar with circuits, note that λ is none other than the voltage!)
b) We want to minimize
I12 R1 + I22 R2 + I32 R3
subject to
I1 + I2 + I3 = I
where I is a constant. Using Lagrange multipliers we get the equations:
2I1 R1 = λ, 2I2 R2 = λ, 2I3 R3 = λ, I1 + I2 + I3 = I
1
which we solve to get that
R2 R3 R1 R3 R2 R1
I1 = I, I2 = I, I3 = I,
D D D
where D = R1 R3 + R2 R3 + R1 R2
Problem 3
y y y r + Δr
(x, y)
y y �
r y
θ θ x θ
x x x Δx x
x
Fig. 1 Fig. 2 Fig. 3
a) y = x tan θ (see Fig. 1). Area = w = 12 xy = 21 x2 tan θ.
� � � �
∂w ∂w 1
⇒ = x tan θ = y, and = x2 sec2 θ.
∂x θ ∂θ x 2
b) As before, y = x tan θ and wx = 12 y, wy = 12 x.
� � � � � �
∂w ∂x ∂y 1 1 1 1
= wx + wy = y + x tan θ = y + y = y,
∂x ∂x ∂x 2 2 2 2
� �θ � �θ � �θ
∂w ∂x ∂y 1 1 2 2
= wx + wy = 0 + x2 sec2 θ = x sec θ.
∂θ x ∂θ x ∂θ x 2 2
c) dw = 12 y dx + 12 x dy, dy = tan θ dx + x sec2 θ dθ.
Eliminate dy from the equation for dw.
dw = 12 y dx + 12 x(tan θ dx + x sec2 θ dθ) = ( 12 y + 12 x tan θ)dx + ( 21 x2 sec2 θ)dθ.
� � � �
∂w 1 1 ∂w 1
⇒ = y + x tan θ = y, and = x2 sec2 θ.
∂x θ 2 2 ∂θ x 2
d) If we fix θ and vary x then (see Fig. 2)
Δw = area of trapezoidal strip at right = Δx · 12 (y + y + Δy) = yΔx + 12 Δx ·
Δy ≈ yΔx. � �
Δw ∂w
(We ignore second order terms.) ⇒ ≈y ⇒ = y.
Δx ∂x θ
If we fix x and vary θ then (see Fig. 3) Δw = area of thin wedge.
The angle of the wedge is Δθ and Δw = 12 r(r + Δr) sin(Δθ) ≈ 12 r(r +
Δr)Δθ ≈ 12 r2 Δθ.
(Here, we’ve used sin x ≈ x and then dropped second order terms.)
� �
Δw 1 2 1 2 2 ∂w 1
⇒ ≈ r = x sec θ ⇒ = x2 sec2 θ.
Δθ 2 2 ∂θ x 2
2
MIT OpenCourseWare

Fall 2010

Partial Derivatives

Uploaded by

Copyright:

Available Formats

Partial Derivatives

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Partial Derivatives

Uploaded by

Copyright:

Available Formats

Functions of two variables

Examples: Functions of several variables

18.02SC Multivariable Calculus

Hyperboloid of one sheet:

Hyperboloid of two sheets:

18.02SC Multivariable Calculus

18.02SC Multivariable Calculus

18.02SC Multivariable Calculus

1. The tangent plane.

(x0 , y0 , w0 ). What’s the equation of this tangent plane? Re­

(i) must pass through (x0 , y0 , w0 ), where w0 = f (x0 , y0 );

A(x − x0 ) + B(y − y0 ) + C(w − w0 ) = 0 .

(3) w − w0 = a(x − x0 ) + b(y − y0 ), a = A/C, b = B/C.

a = slope of plane (3) in the i -direction (by putting y = y0 in (3));

Therefore the equation of the tangent plane to w = f (x, y) at (x0 , y0 ) is

2. The approximation formula.

Example 2. The sides a, b, c of a rectangular box have lengths measured to be respec­

18.02SC Multivariable Calculus

4. Critique of the approximation formula.

Smoothness hypothesis. We say f (x, y) is smooth at (x0 , y0 ) if

(8) fx and fy are continuous in some rectangle centered at (x0 , y0 ).

If (8) holds, the approximation formula (6) will be valid.

(10) Δw = Δw1 + Δw2 ,

This argument readily generalizes to the higher-dimensional approximation formulas, such

18.02SC Multivariable Calculus

18.02SC Multivariable Calculus

fx (x0 , y0 ) = 0 and fy (x0 , y0 ) = 0.

Example: Making a box with minimum material.

18.02SC Multivariable Calculus

1. The least-squares line.

(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )

and we want to ﬁnd the line

2. Fitting curves by least squares.

(9) y = a0 f0 (x) + a1 f1 (x) + . . . + ar fr (x),

to ﬁt a set of data points

(11) (x1 , y1 , z1 ), . . . , (xn , yn , zn ) .

18.02SC Multivariable Calculus

(0, 0), (1, 2), (2, 1).

D = (0 − (a · 0 + b))2 + (2 − (a · 1 + b))2 + (1 − (a · 2 + b))2

18.02SC Multivariable Calculus

1. The Second Derivative Test

(1) A = (fxx )0 , B = (fxy )0 = (fyx )0 , C = (fyy )0 ,

(we are assuming the derivatives exist and are continuous).

AC − B 2 > 0, A > 0 or C > 0 ⇒ (x0 , y0 ) is a minimum point;

If AC − B 2 = 0, the test fails and more investigation is needed.

at (1, 2), we have AC − B 2 = 144 and A > 0, so it is a a

A plot of the level curves is given at the right, which con­

near the origin can be determined by using the approximation

like those of the function x(x − y): the family of hyperbolas

Lemma. For the quadratic function Ax2 + 2Bx + C,

(5) AC − B 2 > 0, A > 0 or C > 0 ⇒ Ax2 + 2Bx + C > 0 for all x;

If A < 0 or C < 0, the reasoning is analogous and proves (6).

Proof of the Second-derivative Test in a special case.

Argument for the Second-derivative Test for a general function.

and diﬀerentiating once more and using the same reasoning,

18.02SC Multivariable Calculus

1. Find and classify all the critical points of

Answer: Taking the ﬁrst partials and setting them to 0:

∂2z ∂2z ∂2z

18.02SC Multivariable Calculus

(x0 , y0 , w0 ). What’s the equation of this tangent plane? Re

Example 2. The sides a, b, c of a rectangular box have lengths measured to be respec

A plot of the level curves is given at the right, which con