0% found this document useful (0 votes)
23 views5 pages

Taylor Series Notes

Uploaded by

MR I Mac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views5 pages

Taylor Series Notes

Uploaded by

MR I Mac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

LINEAR ALGEBRA AND VECTOR ANALYSIS

MATH 22B

Unit 17: Taylor approximation

Introduction
17.1. According to legend 1, Richard Feynman got into the challenge to compute the
cube root of 1729.03 against an Abacus computation. By using linear approximation
and a bit o luck, he could get 12.002384 using paper and pencil. The actual cube
root is 12.002383785691718123057. How did Feynman do it? The secret is in linear
approximation. This means that we approximate a function like f (x) = x1/3 with a
linear function. The same can be done with functions of several variables. The linear
approximation if of the form L(x) = f (a) + f 0 (a)(x − a).

Figure 1. The Abacus scene in the movie “Infinity”.

17.2. One can also do higher order approximations. The function f (x) = ex for
example has the linear approximation L(x) = 1 + x at a = 0 and the quadratic
approximation Q(x) = 1 + x + x2 /2 at a = 0. To get the quadratic term, we just
need to make sure that the first and second derivative at x = a agree. This gives the
formula Q(x) = f (a) + f 0 (a)(x − a) + f 00 (a)(x − a)2 /2. Indeed, you can check that f (x)
and Q(x) have the same first derivatives and the same second derivatives at x = a. A

1”Feynmans book ”What do you care what other people think”


Linear Algebra and Vector Analysis

degree n approximation is then the polynomial


n
X
(k) (x − a)k
Pn (x) = f (a) .
k=0
k!

For the function ex for example, we have the m’th order approximation

ex = 1 + x + x2 /2! + x3 /3! + ... + xn /n! .

17.3. The same can be done in higher dimensions. Everything is the same. We just
have to use the derivative df rather than the usual derivative f 0 . We look here only at
linear and quadratic approximation of functions Rn → R The linear approximation is
then
L(x) = f (a) + ∇f (a)(x − a)
where ∇f (a) = df (a) = [fx1 (a), · · · , fxn (a)] is the Jacobian matrix, which ii a row
vector. Now, since we can see df (x) : Rn → Rn the second derivative is a matrix
d2 f (x) = H(x). It is called the Hessian. It encodes all the second derivatives Hij (x) =
f xi xj .

Lecture
17.4. Given a function f : Rm → Rn , its derivative df (x) is the Jacobian matrix. For
every x ∈ Rm , we can use the matrix df (x) and a vector v ∈ Rm to get Dv f (x) =
df (x)v ∈ Rm . For fixed v, this defines a map x ∈ Rm → df (x)v ∈ Rn , like the original
f . Because Dv is a map on X = { all functions from Rm → Rn }, one calls it an
operator. The Taylor formula f (x + t) = eDt f (x) holds in arbitrary dimensions:

Dv tf (x) Dv2 t2 f (x)


Theorem: f (x + tv) = eDv t f = f (x) + 1!
+ 2!
+ ...

17.5. Proof. It is the single variable Taylor on the line x+tv. The directional derivative
Dv f is there the usual derivative as limt→0 [f (x + tv) − f (x)]/t = Dv f (x). Technically,
we need the sum to converge as well: like functions built from polynomials, sin, cos, exp.

17.6. The Taylor formula can be written down using successive derivatives df, d2 f, d3 f
also, which are then called tensors. In the scalar case n = 1, the first derivative df (x)
leads to the gradient ∇f (x), the second derivative d2 f (x) to the Hessian matrix
H(x) which is a bilinear form acting on pairs of vectors. The third derivative d3 f (x)
then acts on triples of vectors etc. One can still write as in one dimension
2
Theorem: f (x) = f (x0 ) + f 0 (x0 )(x − x0 ) + f 00 (x0 ) (x−x
2!
0)
+ ···

if we write f (k) = dk f . For a polynomial, this just means that we first write down the
constant, then all linear terms then all quadratic terms, then all cubic terms etc.
17.7. Assume f : Rm → R and stop the Taylor series after the first step. We get
L(x0 + v) = f (x0 ) + ∇f (x0 ) · v .
It is custom to write this with x = x0 + v, v = x − x0 as

L(x) = f (x0 ) + ∇f (x0 ) · (x − x0 )

This function is called the linearization of f . The kernel of L − f (x0 ) is a linear


manifold approximating the surface {x | f (x) − f (x0 ) = 0}. If f : Rm → Rn , then the
just said can be applied to every component fi of f , with 1 ≤ i ≤ n. One can not
stress enough the importance of this linearization. 2

17.8. If we stop the Taylor series after two steps, we get the function Q(x + v) =
f (x) + df (x) · v + v · d2 f (x) · v/2. The matrix H(x) = d2 f (x) is called the Hessian
matrix at the point x. It is also here custom to eliminate v by writing x = x0 + v.

Q(x) = f (x0 ) + ∇f (x0 ) · (x − x0 ) + (x − x0 ) · H(x0 )(x − x0 )/2

is called the quadratic approximation of f . The kernel of Q−f (x0 ) is the quadratic
manifold Q(x) − f (x0 ) = x · Bx + Ax = 0, where A = df and B = d2 f /2. It
approximates the surface {x | f (x) − f (x0 ) = 0} even better than the linear one. If
|x − x0 | is of the order , then |f (x) − L(x)| is of the order 2 and |f (x) − Q(x)| is of
the order 3 . This follows from the exact Taylor with remainder formula. 3

L=C
f=C

Q=C

Figure 2. The manifolds f (x, y) = C, L(x, y) = C and Q(x, y) = C


for C = f (x0 , y0 ) pass through the point (x0 , y0 ). To the right, we see
the situation for f (x, y, z) = C. We see the best linear approximation
and quadratic approximation. The gradient is perpendicular.

17.9. To get the tangent plane to a surface f (x) = C one can just look at the linear
manifold L(x) = C. However, there is a better method:
The tangent plane to a surface f (x, y, z) = C at (x0 , y0 , z0 ) is ax+by+cz =
d, where [a, b, c]T = ∇f (x0 , y0 , z0 ) and d = ax0 + by0 + cz0 .

2Again: the linearization idea is utmost important because it brings in linear algebra.
3If Pn Rt
f ∈ C n+1 , f (x+t) = k=0 f (k) (x)tk /k!+ 0 (t−s)n f (n+1) (x+s)ds/n! (prove this by induction!)
Linear Algebra and Vector Analysis

17.10. This follows from the fundamental theorem of gradients:


Theorem: The gradient ∇f (x0 ) of f : Rm → R is perpendicular to the
surface S = {f (x) = f (x0 ) = C} at x0 .

Proof. Let r(t) be a curve on S with r(0) = x0 . The chain rule assures d/dtf (r(t)) =
∇f (r(t)) · r0 (t). But because f (r(t)) = c is constant, this is zero assuring r0 (t) being
perpendicular to the gradient. As this works for any curve, we are done.
Examples
17.11. Let f : R2 → R be given as f (x, y) = x3 y 2 + x + y 3 . What is the quadratic
approximation at (x0 , y0 ) = (1, 1)? We have df (1, 1) = [4, 5] and
       
fx 4 fxx fxy 6 6
∇f (1, 1) = = , H(1, 1) = = .
fy 5 fyx fyy 6 8
The linearization is L(x, y) = 4(x − 1) + 5(y − 1) + 3. The quadratic approximation
is Q(x, y) = 3 + 4(x − 1) + 5(y − 1) + 6(x − 1)2 /2 + 12(x − 1)(y − 1)/2 + 8(y − 1)2 /2.
This is the situation displayed to the left in Figure (2). For v = [7, 2]T , the directional
derivative Dv f (1, 1) = ∇f (1, 1) · v = [4, 5]T · [7, 2] = 38. The Taylor expansion given
at the beginning is a finite series because f was a polynomial: f ([1, 1] + t[7, 2]) =
f (1 + 7t, 1 + 2t) = 3 + 38t + 247t2 + 1023t3 + 1960t4 + 1372t5 .
17.12. For f (x, y, z) = −x4 + x2 + y 2 + z 2 , the gradient and Hessian are
       
fx 2 fxx fxy fxz −10 0 0
∇f (1, 1, 1) =  fy  =  2  , H(1, 1, 1) =  fyx fyy fyz  =  0 2 0  .
fz 2 fzx fzy fzz 0 0 2
The linearization is L(x, y, z) = 2 − 2(x − 1) + 2(y − 1) + 2(z − 1). The quadratic
approximation
Q(x, y, z) = 2 − 2(x − 1) + 2(y − 1) + 2(z − 1) + (−10(x − 1)2 + 2(y − 1)2 + 2(z − 1)2 )/2
is the situation displayed to the right in Figure (2).
17.13. What is the tangent plane to the surface f (x, y, z) = 1/10 for f (x, y, z) =
10z 2 − x2 − y 2 + 100x4 − 200x6 + 100x8 − 200x2 y 2 + 200x4 y 2 + 100y 4 = 1/10
 
0
at the point (x, y, z) = (0, 0, 1/10)? The gradient is ∇f (0, 0, 1/10) =  0 . The
2
tangent plane equation is 2z = d, where the constant d is obtained by plugging in the
point. We end up with 2z = 2/10. The linearization is L(x, y, z) = 1/20 + 2(z − 1/10).
Figure 3.

Homework

Problem 16-17.1: Let r(t) = [3t + cos(t), t + 4 sin(t)]T be a curve and


f ([x, y]T ) = [x3 + y, x + 2y + y 3 ]T be a coordinate change.
a) Compute v = r0 (0) at t = 0, then df (x, y) and A = df (r(0)) and
df (r(0))r0 (0) = Av.
b) Compute R(t) = f (r(t)) first, then find w = R0 (0). It should agree
with a).

Problem 16-17.2: a) The surface


y2 z2
f (x, y, z) = x2 + + = 4 + 1/4 + 1/9
4 9
is an ellipsoid. Compute zx (x, y) at the point (x, y, z) = (2, 1, 1) using the
implicit differentiation rule. (Use the formula).
b) Apply the Newton step 3 times starting with x = 2 to solve the equation
x2 − 2 = 0.

Problem 16-17.3: Evaluate without technology the cube root of 1002


using quadratic approximation. Especially look how close you are to the
real value.

Problem 16-17.4: a) Find the tangent plane to the√surface f (x, y, z) =



xyz = 60 at (x, y, z) = (100, 36, 1). b) Estimate 100.1 · 36.1 · 0.999
using linear approximation (compute L(x, y, z) rather than f (x, y, z).)

Problem 16-17-5: Find the quadratic approximation Q(x, y) of


f (x, y) = x3 + x2 y + x2 + y 2 − 2x + 3xy at the point (1, 2) by computing
the gradient vector ∇f (1, 2) and the Hessian matrix H(1, 2). The vector
∇f (1, 2) is a 1 × 2 matrix (row vector) and the Hessian matrix H(1, 2) is
a 2 × 2 matrix.

Oliver Knill, [email protected], Math 22b, Harvard College, Spring 2022

You might also like