0% found this document useful (0 votes)
9 views10 pages

Math 4 Week 4 UvA Solutions

The document contains exercises and solutions related to differential calculus, focusing on the computation of derivatives for various functions. It includes examples of functions from R to R2, R2 to R, and affine functions, along with recursive definitions and applications of the chain rule. Additionally, it discusses the product rule for differentiable mappings and provides detailed calculations for specific cases.

Uploaded by

hengyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

Math 4 Week 4 UvA Solutions

The document contains exercises and solutions related to differential calculus, focusing on the computation of derivatives for various functions. It includes examples of functions from R to R2, R2 to R, and affine functions, along with recursive definitions and applications of the chain rule. Additionally, it discusses the product rule for differentiable mappings and provides detailed calculations for specific cases.

Uploaded by

hengyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

23

3 Differential calculus

Exercise 3.1. For each of the following functions determine the derivative Df :
(a) f : R → R2 , f (t) = (et cos t, et sin t)
(b) f : R2 → R, f (x, y) = ex (cos y + sin y)
(c) f : R2 → R2 , f (x, y) = (ex cos y, ex sin y)
(d) f : R2 → R3 , f (x, y) = (y 2 , x cos y, y sin x)

Solution:

d   
′ dt
(et cos t) et (cos t − sin t)
(a) f (t) = d = t
dt
(et sin t) e (sin t + cos t)
 
(b) Df (x, y) = Dx f (x, y) Dy f (x, y) = ex (cos y + sin y) ex (− sin y + cos y)
   x 
Df1 (x, y) e cos y −ex sin y
(c) Df (x, y) = =
Df2 (x, y) ex sin y ex cos y
   
Df1 (x, y) 0 2y
(d) Df (x, y) = Df2 (x, y) =  cos y −x sin y 
Df3 (x, y) y cos x sin x

Exercise 3.2. Let f : Rm → Rm be differentiable at 0 and satisfy f (0) = 0. Define for all
n ∈ N the function f n : Rm → Rm recursively as f 0 (x) = x and f n+1 (x) = f (f n (x)).
(a) If g : R → R is given as g(x) = x2 , compute g n (x) for all n ∈ N.
(b) Compute Df 0 (0) and Df 1 (0). Use the chain rule to compute Df 2 (0) in terms of
Df (0).
(c) Guess an expression for Df n (0) and prove it using mathematical induction.
(d) If g : R2 → R2 satisfies g(0) = 0 and
 
2 0
Dg(0) = ,
0 3

© Maurice Koster, UvA, 2025


24

compute Dg n (0).

Solution:

(a) We have

g 2 (x) = g(g(x)) = g(x2 ) = (x2 )2 = x4 , g 3 (x) = g(g 2 (x)) = g(x4 ) = (x4 )2 = x8 .


n
We conjecture that g n (x) = x2 .
Proof by induction. The relation has already been shown for n = 1, 2, 3. Assume that
it holds for n. Then
n n 2 n+1
g n+1 (x) = g(g n (x)) = g(x2 ) = x2 = x2 .

Hence it is true for n + 1 as well, and by induction, for all n.


(b) Since f 0 (x) = x, we have
  
∂x1 ∂x1 ∂x1
∂x1 ∂x2
... ∂xn 1 0 ... 0
 ..   .. 
 ∂x2 ∂x2   
 .  0 1 .
Df (x) =  ∂x. 1 .  =
. . .. 
0 ∂x2 = I.
 . ..  . . .
 . . ..   .. 
∂xn
∂x1
. . . . . . ∂x
∂xn
n
0 ... ... 1

As f 1 (x) = f (f 0 (x)) = f (x), it follows that Df 1 (x) = Df (x) and Df 1 (0) =


Df (0).
As f 2 (x) = f (f 1 (x)) = f (f (x)), the chain rule implies

Df 2 (x) = Df (f (x))Df (x).

In particular, as f (0) = 0, we have

Df 2 (0) = Df (0)Df (0) = (Df (0))2 .

(c) We conjecture that f n (0) = 0 and Df n (0) = (Df (0))n . This true for n = 0 and
n = 1.
Assume that it has been proved for n. First we have f n+1 (0) = f (f n (0)) = f (0) = 0.

© Maurice Koster, UvA, 2025


25

The chain rule implies

Df n+1 (x) = Df (f n (x))Df n (x)

and in particular

Df n+1 (0) = Df (f n (0))Df n (0) = Df (0)Df n (0) = Df (0)(Df (0))n = (Df (0))n+1 .

We have proved the assertion also for n + 1; hence, by mathematical induction, it is


true for all n.
(d)  
n n 2n 0
Dg (0) = (Dg(0)) = .
0 3n

Exercise 3.3.  
1 2 3
Let f : R → R be the affine function defined by f (x) = Ax+b, where A =
3 2
4 5 6
and b = (2, 4). Find Df (x).

Solution:
We have    
f1 (x) x1 + 2x2 + 3x3 + 2
f (x) = = .
f2 (x) 4x1 + 5x2 + 6x3 + 5
Then !  
∂ ∂ ∂
f (x)
∂x1 1
f (x)
∂x2 1
f (x)
∂x3 1 1 2 3
Df (x) = ∂ ∂ ∂ = = A.
f (x)
∂x1 2
f (x)
∂x2 2
f (x)
∂x3 2
4 5 6
This should be no surprise. You are asked to prove it more formally in the next exercise.

Exercise 3.4.
Let f : Rm → Rk be the affine function defined by f (x) = Ax + b, where A is k × m
matrix and b ∈ Rk . Find Df (x).

Solution:

© Maurice Koster, UvA, 2025


26

We can find the derivatives for each of the coordinate functions


X
m
fi (x) = f (x)i = (Ax + b)i = (Ax)i + bi = Aij xj + bi .
j=1

We have
" m #
∂ X Xm
∂ ∂
fi (x) = Aij xj + bi = [Aij xj + bi ] = Ait
∂xt ∂xt j=1 j=1
∂x t

 

···
f (x)
∂x1 1

f (x)
∂xm 1
 
Hence Df (x) =  =A
.
.. .. ..
 . . 

f (x) · · ·
∂x1 k

f (x)
∂xm k

Exercise 3.5. Consider the two functions f, g : R3 → R2 given by


 ! !

 1 2 3 1
f (x) = x+
3 2 1 0



g(x) = (x1 x2 , x2 x3 )

Define
h(x) = f (x) · g(x).
Calculate Dh(1, 1, 1). Verify that Dh(1, 1, 1) = f (1, 1, 1)⊤ Dg(1, 1, 1)+g(1, 1, 1)⊤ Df (1, 1, 1).

Solution:

•  
 
x1    
1 2 3   1 x1 + 2x2 + 3x3 + 1
f (x) = · x2 + =
3 2 1 0 3x1 + 2x2 + x3
x3

• g(x) = (x1 x2, x2 x3 )

© Maurice Koster, UvA, 2025


27

h(x) = (x1 + 2x2 + 3x3 + 1) · x1 + (3x1 + 2x2 + x3 ) · x2 x3


= x1 x2 (x1 + 2x2 + 3x3 + 1) + x2 x3 (3x1 + 2x2 + x3 )

 
x2 (1 + 2x2 + 3x3 + 1) + x2 x3 (3)
• ∇h(x) = x1 (1 + 2x2 + 3x3 + 1) + 2x2 + x3 (3x1 + 2x2 + x3 ) + 2x2 x3 
3x1 x2 + x2 (3x1 + 2x2 + x3 )
   
7+1+3 11
• ∇h(1, 1, 1) = 1 + 2 + 6 + 2 = 17
  
3+6+1 10
• f (1, 1, 1) = (7, 6), g(1, 1, 1) = (1, 1)
 
1 2 3
• Df (1, 1, 1) =
3 2 1
   
x2 x3 0 1 1 0
• Dg(x1 , x2 , x3 ) = so that Dg(1, 1, 1) =
0 x3 x2 0 1 1
• So check that Dh(1, 1, 1) = f (1, 1, 1)T Dg(1, 1, 1) + g(1, 1, 1)T Df (1, 1, 1) =
   
1 1 0 1 2 3
(7 6) + (1 1) = (7 13 6) + (4 7 4) = (11 27 10)
0 1 1 3 2 1

Exercise 3.6. Extension product rule


Consider two differentiable mappings f : Rm → Rk and g : Rm → Rk . Define the
(inner) product function h(x) = f (x) · g(x). Prove that

Dh(x) = f (x)⊤ Dg(x) + g(x)⊤ Df (x).

Hint: first figure out how this works if k = 1 and see the previous exercise.

Solution:

© Maurice Koster, UvA, 2025


28

Pk
We have h(x) = t=1 ft (x)gt (x) so that
" k # k  
∂ ∂ X X ∂
h(x) = ft (x)gt (x) = ft (x)gt (x)
∂xj ∂xj t=1 t=1
∂x j

X ∂ft
k 
∂gt
= (x)gt (x) + (x)ft (x)
t=1
∂xj ∂xj
Xk   X k  
∂ft ∂gt
= (x)gt (x) + (x)ft (x)
t=1
∂xj t=1
∂xj
∂f ∂g
= (x) · g(x) + (x) · f (x)
∂xj ∂xj

Then we may conclude that

Dh(x) = g(x)⊤ Df (x) + f (x)⊤ Dg(x).

Note that in the third equality we used the product rule for single variable functions.
Moreover, realize that g(x) and f (x) are column vectors, Df (x) and Dg(x) row vectors.
 
⊤ 1 2
Exercise 3.7. Let f : R → R be given by f (x) = x Ax where A =
2
. Determine
0 4
Df (x), directly and using the former exercise.

Solution:
Write out in a scalar fashion
 
 x1 + 2x2
f (x) = x1 x2 = x1 (x1 + 2x2 ) + 4x22 = x21 + 2x1 x2 + 4x22 .
4x2

Hence, Df (x) = 2x1 + 2x2 2x1 + 8x2 . Now use the former exercise with product
functions h(x) = x, g(x) = Ax. Then we have

Df (x) = h(x)⊤ Dg(x) + g(x)⊤ Dh(x)


   
 1 2  1 0
= x1 x2 + x1 + 2x2 4x2
0 4 0 1
  
= x1 2x1 + 4x4 + x1 + 2x2 4x4 = 2x1 + 2x2 2x1 + 8x2

© Maurice Koster, UvA, 2025


29

Notice that this gives the same answer.

Exercise 3.8. Let f : Rm → R be given by f (x) = x⊤ Ax where A is some m × m matrix.


Determine Df (x). How does your answer simplify if A is symmetric?

Solution:
Consider a quadratic form f (x) = x · Ax where A is a real (not necessarily symmetric)
m × m matrix A. We will show that Df (x) = 2x⊤ (A⊤ + A), in two different ways:
(a) To see this, first rewrite f (x): take k ∈ {1, 2, . . . , m} and split the product in parts
that do depend on xk and parts that don’t:

X
m X
m X
m X
m X
m
f (x) = x⊤ Ax = xi (Ax)i = xi aij xj = aij xi xj
i=1 i=1 j=1 i=1 j=1
X
m XX
m
= akj xk xj + aij xi xj
j=1 i̸=k j=1
X X X
=∗ akk x2k + akj xk xj + aik xi xk + aij xi xj
j̸=k i̸=k i,j̸=k

Now take the partial derivative with respect to variable xk :

∂ X X
f (x) = 2akk xk + akj xj + aik xi + 0
∂xk j̸=k i̸=k
X
m X
m
= akj xj + aik xi
j=1 i=1

= (Ax)k + (x⊤ A)k = (x⊤ A⊤ )k + (x⊤ A)k


= (x⊤ (A⊤ + A))k .

So we conclude that Df (x) = x⊤ (A⊤ + A). Then for symmetric A, with A⊤ = A,


we obtain – as a corollary – that Df (x) = 2x⊤ A.
(b) Using the extended product rule (see Exercise 3.3) for h(x) = x · Ax. Then we have
that Dh(x) = D(x) · Ax + D(Ax) · x = IAx + A⊤ x = (A + A⊤ )x. The chain
rule now gives

© Maurice Koster, UvA, 2025


30

Exercise 3.9. Application Econometrics

let A be a symmetric m × m matrix, and f : Rm → R the quadratic form f (x) = x⊤ Ax.


Consider a mapping g : Rk → Rm defined by g(β) = y − Xβ, where X is a m × k
matrix and y ∈ Rm .
Here the matrix X contains m observations of k explanatory variables, y contains the
corresponding values of the response variable.
In practical applications one wants minimize the function h : Rk → R defined as com-
position of f with g:

h(β) := (f ◦ g)(β) = (y − Xβ)⊤ A(y − Xβ).

In standard OLS, the matrix A is just the identity matrix. As we will see later, minimizing
this expression requires the partial derivatives of the function h.
Find an expression for Dh(β) in terms of A, X, y.

Solution:
Use the former exercise here. We have Df (x) = 2x⊤ A and Dg(β) = −X. So using the
Chain Rule we get

Dh(β) = D(f ◦ g)(β) = Df (g(β))Dg(β)


= 2g(β)⊤ A(−X) = −2(y − Xβ)⊤ AX

Exercise 3.10. Application: Neural Networks


Consider the function f : Rm → R

f (w) = σ(w⊤ x + b),

for x ∈ Rm and b ∈ R,
1
σ(t) = .
1 + e−t
Calculate ∇f (w), and show that it can be expressed in terms of σ, x, and b.

© Maurice Koster, UvA, 2025


31

Solution:
f is a differentiable function composed of two functions, an affine transformation of w and
a differentiable function σ.
Standard differentiation techniques give

1 e−t 1
σ ′ (t) = e−t · = · = (1 − σ(t))σ(t)
(1 + e−t )2 1 + e−t 1 + e−t

Moreover, the chain rule for differentiation gives:

Df (w) = σ ′ (w⊤ x + b) · D(w⊤ x + b) = σ(w⊤ x + b)(1 − σ(w⊤ x + b)) · x⊤ .

So we have
∇f (w) = Df (w)⊤ = f (w)(1 − f (w))x.

Exercise 3.11. Application: Neural Networks


Consider the function f : Rm+1 → R

f (w, b) = tanh(w⊤ x + b),

for x ∈ Rm and b ∈ R,
et − e−t
tanh(t) = .
et + e−t
Calculate ∇f (w, b) and express it in terms of f and x.

Solution:
f is a differentiable function composed of two functions, an affine transformation of w and
a differentiable function tanh .
Standard differentiation techniques give

(et + e−t )(et + e−t ) − (et − e−t )(et − e−t )


tanh′ (t) = −t 2
= 1 − tanh2 (t)
(1 + e )

Moreover, the chain rule for differentiation gives:

Df (w, b) = tanh′ (w⊤ x + b) · D(w⊤ x + b) = (1 − tanh(w⊤ x + b)2 ) · (x⊤ 1).

© Maurice Koster, UvA, 2025


32

So we have  
⊤ x
∇f (w) = Df (w) = (1 − (f (w)) ) · 2
1

Exercise 3.12. Consider the functions f : R2 → R2 , g : R3 → R2 as specified by

g(x, y, z) = (x − y + z, x2 + y 2 )
f (s, t) = (es + e−t , e−s + et )

Let h be the composition of f with g, h := f ◦ g. Determine Dh(−1, 1, 2) using the


chain rule for differentiation.

Solution:

 
es −e−t
Df (s, t) = ,
−e−s et
 
1 −1 1
Dg(x, y, z) =
2x 2y 0
  
−e−(x +y 1 −1 1
2 2)
ex−y+z
Dh(x, y, z) = Df (g(x, y, z))Dg(x, y, z) =
−e−x+y−z
2 2
ex +y 2x 2y 0
 x−y+z 
− 2xe−(x +y ) −ex−y+z − 2ye−(x +y )
2 2 2 2
e ex−y+z
−e−x+y−z + 2xex +y e−x+y−z + 2yex +y −e−x+y−z
2 2 2 2

Substitute (x, y, z) = (−1, 1, 2) to conclude that


 
1 + 2e−2 −1 − 2e−2 1
Dh(−1, 1, 2) =
−1 − 2e 2
1 + 2e −1
2

© Maurice Koster, UvA, 2025

You might also like